Single Molecule Statistics and the Polynucleotide Unzipping Transition 

David K. LubenskyO and David R. Nelsonlll 

Department of Physics, Harvard University 

Cambridge MA 02138 

We present an extensive theoretical investigation of the mechanical unzipping of double-stranded 
DNA under the influence of an applied force. In the limit of long polymers, there is a thermodynamic 
unzipping transition at a critical force value of order 10 pN, with different critical behavior for ho- 
mopolymers and for random heteropolymers. We extend results on the disorder-averaged behavior 
of DNA's with random sequences p ] to the more experimentally accessible problem of unzipping a 
single DNA molecule. As the applied force approaches the critical value, the double-stranded DNA 
unravels in a series of discrete, sequence-dependent steps that allow it to reach successively deeper 
energy minima. Plots of extension versus force thus take the striking form of a series of plateaus 
separated by sharp jumps. Similar qualitative features should reappear in micromanipulation exper- 
iments on proteins and on folded RNA molecules. Despite their unusual form, the extension versus 
force curves for single molecules still reveal remnants of the disorder-averaged critical behavior. 
Above the transition, the dynamics of the unzipping fork is related to that of a particle diffusing 
in a random force field; anomalous, disorder-dominated behavior is expected until the applied force 
exceeds the critical value for unzipping by roughly 5 pN. 
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I. INTRODUCTION 

Over the past decade, the experimental repertoire of biophysicists and structural biologists has expanded to include 
some remarkable micromanipulation techniques. These single molecule methods are a natural complement to more 

Q ■ traditional scattering and spectroscopic measurements: Although they cannot ascertain structures at atomic resolu- 
tion, they do give important information about the organization of disordered or strongly-fluctuating systems, and 
they yield valuable estimates of the forces and energies that stabilize a given structure. Moreover, micromanipulation 
I ' experiments on single molecules open a window into a rich and largely unexplored set of physical phenomena. One 

^ . can now measure entire distributions of molecular properties, without the requirement for averaging over a macro- 
scopic sample. Not only does the wealth of resulting data allow more stringent tests of ideas originally developed for 
macroscopic systems, it also has the potential to reveal entirely new behavior that was not discernible in aggregate 
results on heterogeneous populations of molecules P,[|M]. In this paper, we study an example of a system — the un- 
zipping of double-stranded DNA (dsDNA) — that shows exactly such novel response on the single molecule level. Our 
results are also directly applicable to the unzipping of a single RNA hairpin, and similar ideas can be applied to the 
force-induced denaturation of RNA's with more complicated secondary structures Q and even to the stretching of 

^ . folded proteins || . 

In the DNA unzipping problem, the two single strands of a double-stranded DNA molecule with a randomly chosen 
base sequence are pulled apart under the influence of a constant force (Fig. yj) . In addition to providing a surprisingly 
good description of protein-coding DNA M , the assumption of a random sequence gives us an analytically tractable 
model; its solution then allows us to gain insight into a much broader class of systems. DNA unzipping thus serves 
as a model problem to illuminate the effect of sequence variation on a micromechanical experiment. 

In a previous brief communication H , we showed that the average extension versus force curve of an ensemble of 
random heteropolymers is markedly different from the corresponding curve for a homopolymer. Here, we move beyond 
averages over many different random sequences to examine the unzipping of a single dsDNA molecule. Interesting 
qualitative lessons emerge. Whereas a homopolymer gains considerable entropy by opening in response to a constant 
force, a heteropolymer unzips primarily for energetic reasons. In fact, the unzipping process is dominated by the 
presence of deep energy minima and is only mildly perturbed by thermal fluctuations. At any given applied force, the 
system will sit in the deepest available minimum; because the location of the minimum varies discontinuously with 
the applied force, the number of bases opened will show sharp jumps at certain force values. Moreover, the energy 
landscape is determined by the polymer's sequence, so the force-extension curve will be str ongly sequen ce-dependent. 



A number of theorists have recently addressed aspects of dsDNA unzipping p[|8[|9|, |l Qjl l|Jl 4[1 3 14 1; the mechan- 
ical prop erties of a single-stranded polynucleotide that can pair with itself have also received considerable atten- 
tion |15|Jl^ , p^| , [l8|| . With a few exceptions (T^JI^], however, this work has been restricted to the study of homopolymers, 
and thus does not overlap directly with the results presented here. 

Although our model is chosen more for its simplicity than for a clear correspondence to a particular experiment in 
the literature, several related experiments have nonetheless been performed. Early studies by Lee and coworkers Jigj] 



were followed by the ground-breaking work of Essevaz-Roulet, Bockelmann, and Heslot [g0|, who demonstrated the 
feasibility of mechanically denaturing single dsDNA molecules, and showed that many features of their results could 
be understood using equilibrium statistical mechanics. Subsequently similar experiments have been performed using 
an atomic force microscope [[2l| , [22|j . In contrast to our calculations, this work was done in an ensemble in which 
the positions of the two single-stranded ends are held fixed while an average force is measured. Because of subtleties 
associated with the statistical mechanics of single molecule systems, this constant extension ensemble is not equivalent 
in the usual sense to our constant force ensemble; the connection between the two will be discussed in more detail in 
section VI . More recently Liphardt and coworkers have mechanically unfolded several different short RN A molecules 
related to a domain of the Tetrahymena thermophila ribozyme |Q. Here, a bead tethered to a force-measuring optical 
trap was used both to impose an extension and, with feedback, to monitor extension at fixed force — precisely the 
situation of interest in this paper. Alternatively a constant force could be applied directly using a magnetic bead in 
a constant magnetic field gradient J23| . 

In the remainder of this paper, we first, in Section O, describe in more detail the phase diagram of polynucleotide 
duplexes and show how a coarse-grained model of the unzipping transition can be derived from more microscopic 
descriptions of dsDNA. This model, which will form the basis of all subsequent calculations, is summarized in Eqs. (|13| ) 
through (n5|) . For the purposes of comparison, we derive in Section pTn some results on the unzipping of homopolymeric 



dsDNA. Section IV revisits in more detail the disorder-averaged force-extension curve examined in |JJ. The bulk of 
our new results on single-molecule unzipping appear in section M. We show that the equilibrium extension versus 
force curve of a single dsDNA molecule consists of a series of long plateaus followed by large jumps, and we derive a 
statistical description of this striking behavior. We also demonstrate that, despite its choppy appearance, such a curve 
contains hidden signatures of the smooth disorder-averaged behavior. Subsequent sections consider the relationship 



between the conjugate const ant force and constant extension ensembles (Sec. VI) and give a brief overview of the 



dynamics of unzipping (Sec. VII). We point out that polynucleotide unzipping provides an experimental realization 



of the famous Sinai problem of thermally activated diffusion in a quenched random force field [p4|]25|| . Anomalous 



quasi-localized dynamics persist up to roughly 5pN above the unzipping transition. Finally, in Section VIII , we discuss 
the implications of DNA unzipping for micromanipulation experiments on more complicated systems. The appendix 
gives a brief description of the numerical methods used to generate results discussed in the body of the paper. 

II. THE MODEL 

Figure [j] depicts the situation studied in this paper: One of the single strands from a double-stranded DNA molecule 
is attached to a glass slide, and the other to a bead on which a constant force F is exerted. F could be created, for 
example, with magnetic tweezers, which have been used to exert constant piconewton-scale forces over hundreds of 
microns p3[ . Optical tweezers or atomic force microscopes ( AFM) with appropriate feedback can create a similar 
effect p.|26|]. As a result of the applied force, the DNA partially "unzips" , breaking m bonds. As long as the force- 
elongation curve of the liberated single-stranded DNA is known, m can be related to the distance r between the ends 
of the two single strands, which is easily measured. Our main goal is to understand how the equilibrium ensemble 
average (m) (where the angle brackets indicate an average over thermal noise) depends on F and on the base sequence 
of the DNA strand. 

In certain limiting cases, the dependence of m on F is easy to understand. One might expect that at large enough 
forces the dsDNA will unzip completely, whereas for very small forces at most a few bases will open. We show below 
that these two regimes are separated by a sharp first order phase transition. Below the critical force -F c , only a finite 
number of bases at the end of the double strand are pulled open; in the thermodynamic limit of an infinitely long DNA 
molecule, the pulling force thus has no effect on the fraction of open bases, which remains very small in physiological 
conditions. Above F c , the entire molecule unzips, and the fraction of open bases jumps discontinuously to one. This 
phase diagram is sketched in the inset to Figure El As F approaches F c from below, the number m of unzipped 
bases at the end of the molecule diverges. Because this divergence is entirely a surface phenomenon, the unzipping 
transition can be thought of as the one-dimensional analog of a continuous wetting transition [E7| . 

The effect of base sequence on the force-elongation curve is less straightforward. We can gain some insight into 
the role of a variable sequence by considering the problem of unzipping a DNA molecule where each successive 
base is chosen at random, with at most short-ranged correlations between bases. Although the sequence of protein- 
coding DNA is certainly not random in any strict sense, it nonetheless appears to many statistical criteria to fit this 
description (up to a length scale set by the sequence's mosaic structure) jg]. Deviations from randomness that escape 
these tests presumably involve fairly subtle multi-point correlations. Although the structure of the protein for which 
the DNA codes is likely to depend on such correlations, the mechanical denaturation of the DNA itself, which depends 
only on the cumulative energy cost of opening m bases, should be relatively insensitive to them. Simulations of the 



more complicated problem of pulling on folded RNA's have shown good agreement with the predictions of a random 
model [Q. It is thus reasonable, at least as a first approximation, to take the DNA sequence being unzipped to be 
random and uncorrelated. In the remainder of this section, we develop a mathematical description of the unzipping 
of such a DNA sequence by a constant force. 

A. Semi-Microscopic Models 

The bulk thermally-driven melting transition of dsDNA (see Fig. pi) can be described at varying levels of detail by 
a number of models, all of which are expected to give the same universal behavior on long enough length scales. One 
popular choice is an Ising-like description, in which a base pair is taken to be in one of two discrete states — open or 
closed. By convention, the free energy of an unconstrained base pair in the open state is set to zero. A melted stretch 
of single-stranded DNA flanked by two unmelted regions must form a closed loop, and a loop factor accounts for the 
loss of entropy caused by this constraint |2q ] . The Hamiltonian of a semi-infinite strand can be written as a sum of 
(free) energies associated with successive paired and unpaired regions: 



wi=E 



E 



+ 2J + f(c i+1 -o i )\ (1) 



Here base positions are indexed by n <E {0, 1,2,.. .}, and the i closed and open sections start at base numbers Cj 
and Oi respectively (see Fig. ||). Each base pair gains an energy e\ from being closed; sequence dependent stacking 
interactions can be included by adding an additional energy e n ,n+\ |p9fl - For the case of a random DNA sequence, the 
Si are independent random variables. The energy 2 J per open section gives the energetic cost of initiating a melted 
region, and /(c.;+i — o.;) oc ln2(c;+i — Oi) is the entropic penalty associated with forming a closed loop of length 
2(ci+i — Oi). If there are open bases at the end of the molecule, before the first closed section, they are counted as 
the zeroth open section and do not incur any loop penalty. The model's partition function is a sum over all possible 
opening and closing points, Z\ = Ei)< c ,<oi<...<c„<o„... exp(-Hi/k B T). 

Alternatively, some models of the melting transition are written in terms of the position of each base in three- 



dimensional space 30 1 . In the continuum limit, the simplest such description of a dsDNA of finite length N has the 
Hamiltonian 



Kc= dn 4- — +F n [R(n)] (2) 




where R(n) is the relative displacement of the two single strands at base pair n, d is the spatial dimension, a is the 
backbone length of a chemical monomer along a single strand, and b is the Kuhn length of single-stranded DNA (see 
Fig. 0); the factor of 1/ab appears instead of the more usual 1/6 2 |3l| because n indexes base pairs rather than Kuhn 
segments. We will usually be interested in the limit N — » oo of a semi-infinite polymer, just as for the Ising-like 
model. By convention, R(n) = when the n th set of bases are paired. Because we will be especially interested in the 
distance between the ends of the two single strands, it is useful to define the extension 

r ee R(0) . (3) 

The first term in Eq. (0) describes the entropic elasticity of the single strands [pl| and thus has the same effect as the 
loop factors in the Ising-like model. The second term accounts for the attractive interactions between the two single 
strands. Coarse-grained over a number of bases, they are described by a phenomcnological potential energy term 

V n [R(n)} = [l + T)(n)}h[R(n)}. (4) 

Here h is a short-ranged attractive potential, and the variation with base sequence of the strength of the attraction 
between strands is described by 77(77,) . Standard methods show that the continuum partition function Zc(R, N) = 
JV[Il'(n)] exp(— Wc[R']/^bT) obeys an imaginary time Schrodinger equation [J31| . 

Either model can readily be extended to include a force pulling apart the double-stranded molecule. We first show 
explicitly how this can be done neglecting long-ranged interactions (e.g. excluded volume or base-pairing interactions) 
within the liberated single strands. Subsequently, we will argue that including such effects will lead to only minor 
changes in our results near enough to the transition. A constant force acting at the end of the DNA (n = 0) to 
separate the two single strands contributes an energy that is linear in their separation. In the case of the continuum 
model (|2|) , one must thus add a term to the Hamiltonian of the form 



^c,puii(F) 



N 



dn F • dH/dn 



(5) 



In writing the second equality, we have neglected the effect of the other end of the dsDNA at N — oo ; with a physical 
polymer of finite length N, this approximation should be valid as long as the number of open bases m < N, so that 
R(A) Mfl. 

Unlike the continuum model, the Ising-like model does not keep track of the positions of the open bases. We must 
thus take an alternative view of the effect of an unzipping force. The last equality of equation (||) gives a hint of 
how to do this. Suppose that, as in Fig. 0, the first closed section of dsDNA starts at base c\, so that m = C\ 
bases are unzipped by the force. In the discrete Ising-like model, each liberated single strand can be described as 
a string of m individual monomers. The n th such monomer contributes a displacement u^ or ujj to the total end 
to end distance of the single strand, where the superscripts distinguish the two strands. The energy of unzipping is 
thus — F • r = — X)r™=o F ' u n + Sr™=o ^ ' U n- Note that there is no reason to extend the sums over n to infinity; 
the positions of base pairs beyond the first closed pair have no effect on the end to end distance r = R(0). We 
would now like to trace over the u's to obtain a contribution to the Hamiltonian that depends only on the number 
of open monomers m = c±. The precise result will depend on the model used to describe the elastic properties of a 
single-stranded monomer. For any reasonable choice, however, the traces over the different u's must decouple, leading 
to a free energy of the form 2mg{F). Here g{F) is the change in free energy of a single-stranded monomer caused by 
applying a tension F; by definition, g(0) = 0. Because the monomers gain energy by aligning with the pulling force, 
g(F) decreases with increasing F. For example, the continuum model Hamiltonian (0) is quadratic in dR/dn and 
thus describes a polymer that responds linearly to an arbitrarily large force. Such a Gaussian model results in a g(F) 
that is quadratic in F: 
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Similarly, for an inextensible freely jointed chain, one finds B2] 
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(6) 



(7) 



In these equations, a is again the backbone distance between bases; the factor of a/b is necessary because g(F) is 
defined as the free energy per chemical monomer, not per Kuhn length. More generally, if the force F ss (x) exerted by 
the single-stranded polymer as a function of the extension x per base can be measured, then 



9(F) 



x{F) 



F ss (x')dx' - Fx(F) 



x(F')dF' 



(8) 



where x(F) is the inverse function of F ss (x). 

Regardless of the exact form of g(F) , the effect of an unzipping force can be included in the Ising-like model by 
adding a term to the Hamiltonian (fil) that gives the free energy of the unzipped monomers under tension. Because 
m — c\ , we have H = Hi + 7ii, pu ii with 



Hi, puU (F) = 2cig(F) . 
Since g(F) < 0, this term favors increasing c\, and thus unzipping the dsDNA. 



(9) 



B. Reduction to One Degree of Freedom 



Semi-microscopic models like those just discussed contain far more detail than is necessary to describe the unzipping 
transition. Our calculations would simplify if we could integrate out nonessential degrees of freedom to obtain a 
description that focuses on the number of unzipped bases m. The full partition function of the Ising-like model is 
a sum over all of the closing and opening points c\, c%, C3, . . . and 01,02,03,.... Among these parameters, the only 
one that determines the number of bases that have been unzipped is c\ . Hence we focus on a constrained partition 
function with c\ — m fixed 
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where the partition function Z° = J2 0=c <0 < <c <0 exp[— Tii/ksT — Hi tPn ii(F) / ksT] with c\ constrained to 
be zero is included so that £(0) — 0. This expression defines the function £(m); it can be introduced in a sim- 
ilar manner in the continuum model, by adding the constraint R(to) = to the partition function Zq (R, N) = 
J 2?[R (n)] exp(— Tic/k^T) — T~tc,pui\/kBT) and replacing the attractive potential V n with a hard core repulsion for 
n < m. £{m) gives the change in free energy from unzipping exactly m bases under the influence of a force F . It can 
be written as the sum of the free energy 2mg{F) of the m liberated base pairs and of the change in free energy of the 
dsDNA when it is shortened by m base pairs. This second term takes account of any fluctuations that open base pairs 
beyond the first closed base c\ and is independent of F . For homopolymeric DNA, this term takes the form —mgo, 
where go < is the average free energy per base pair of dsDNA. Once sequence heterogeneity is present, however, we 
must include sequence-dependent deviations from the average. If the deviation from the average on opening the n th 
base is 77(71), then £(m) can be written as 

m 

S(m) = [2g(F)-go}m + J2v(n). (11) 

n=l 

Consider now the statistics of the random contribution 77(71), assuming that the underlying DNA sequence is random 
and uncorrelated. The function r/(n) reflects this bare sequence [represented by f\ in the continuum model potential (|J)] 
dressed by thermal fluctuations. As long as the dsDNA is well below its melting temperature, one expects that r\ will 
be a random variable with correlations that decay on the scale of the finite correlation length of the dsDNA. If we 
are only interested in long length-scale properties, we can thus take i] to be Gaussian white noise. It is convenient to 
define a quantity 

/ = 2g(F) - g ; (12) 

/ is positive below the unzipping transition and negative above it. Passing to the continuum limit, we can then write 



£{m) = fm + / dnr)(n) , (13) 

Jo 

where 77(71) is a zero mean Gaussian random variable which satisfies 



T]{n)r](n') = AS(n - n') . (14) 

Here the overbar indicates a "disorder average" over different realizations of the random base sequence. The associated 
partition function is simply, up to an unimportant multiplicative constant, 

*-r*->(-^)- 

Eqs. (|13[) through Jl5| ) define the basic model that we will study for the remainder of this paper. It is simple 
enough to allow a number of exact predictions, but still correctly captures the coarse-grained features of unzipping in 
the presence of sequence heterogeneity. It is not difficult to see that our model shows a sharp unzipping transition: 
At F = 0, / = 2g(0) — g = —go is positive. As the pulling force F is increased from 0, g(F) becomes negative, and 
/ decreases but remains positive. £(m) thus grows linearly for large m, and at most a finite number of bases near 
the end of the dsDNA can be unzipped. These do not contribute appreciably to the average free energy per base pair 
of a very long molecule, which remains go as at zero force. As F increases and g(F) becomes increasingly negative, 
however, / changes sign at some critical force value F c satisfying 

MFc) = 50 ■ (16) 

Upon expanding about F c we see that to leading order, / ~ F c — F. For F > F c , the average slope / of £(m) 
is negative, and £(m) tends towards negative infinity for large m. It is thus advantageous to unzip the dsDNA 
completely. With all base-pairs unzipped, the average free energy per pair becomes 2g(F). The discontinuous slope 
at F c of the free energy per base pair as a function of F (see Fig. ||) indicates that the bulk transition is first order. 
Surface quantities such as (m) will nonetheless diverge as the transition is approached, just as in a critical wetting 
transition near a conventional first order phase transition p7[ . The precise surface behavior in this one-dimensional 
system will be the subject of subsequent sections. 

For dsDNA in physiological conditions, one can ignore the rare fluctuational openings of base pairs in the bulk and 
use published base pairing energies to estimate the parameter values in our model. The pairing energies typically vary 



between roughly 1 and 3 k^T per base |33|jj one thus finds go ~ 2k^T and A ~ l(fceT) 2 . A typical Kuhn length for 
single-stranded DNA (ssDNA) is b ~ 15A [32 34 1; inserting this value into the freely jointed chain expression for g(F) 
(Eq. 0) gives a pulling force F of order lOpN at the unzipping transition. As we shall see, the sequence randomness 
dominates when |/| < A/k^T ~ k^T; randomness is hence important whenever there is appreciable unzipping in 
heterogeneous polynucleotide sequences in physiological conditions. 

The model of Eqs. (fL3|) through p5| ) is considerably more general than the semi-microscopic models from which 
we derived it. For example, once g(F) has been ascertained [e.g. by measuring the force-extension curve of ssDNA in 
appropriate conditions J4J.[32|.[34| and using Eq. (|8|)], it can be used without reference to any underlying description of 
the ssDNA. In fact, many predictions of our model are independent of its exact form. Similarly, most models of dsDNA 
(or of RNA hairpins) can be used to define parameters go and A; all of the relevant information about the duplex is 
contained in these two numbers. We also expect that our description applies even when non-local interactions along 
the ssDNA backbone are allowed. All that is required is that the free energy of the ssDNA be proportional to to, so 
that a function g(F) can be defined. For example, a polymer in a good solvent under tension can be described as a 
string of blobs [J35| . Once to is larger than the blob size, as must occur close enough to the unzipping transition, the 
free energy of the single strands will be proportional to to. In fact, in physiological conditions and at the forces of 
order 10 pN required to unzip dsDNA, the blob size will be at most a few monomers, meaning that excluded volume 
interactions can be neglected entirely in a first approximation. Likewise, a model of single stranded polynucleotides 
with uniformly attractive, non-random base pairing interactions (tending to produce hairpins) predicts a free energy 
proportional to the number of bases in the strand ||15[ . This model agrees well with experimental force-extension 
curves for ssDNA. The same calculations show that the fraction of bases in the liberated single strands involved in 
intra-strand pairing interactions will be small at the unzipping transition. Sequence variation will further suppress 
such pairing: Because not all bases can pair with each other, it will generally be necessary to make a large loop in 
order to bring together two stretches of bases that can pair to form a stem. This means that more work must be 
done against the pulling force for the same gain in base pairing energy. Although it might still be possible for a stem 
region of atypically high GC content to pair in this way, in a truly random sequence the probability of finding such a 
region decays exponentially with its length. 

C. Related physical systems 

Although the main focus of this paper will be the mechanical unzipping of polynucleotide duplexes, our formalism 
also applies to other experiments and physical systems. For example, an alternative method for unzipping DNA is to 
force one of the single strands through a very small pore by applying an electric field fl36j . If the pore is so narrow that 
double-stranded DNA cannot fit through it, and if the applied field is strong enough, one of the single strands can 
enter the pore and be drawn through it, thereby unzipping the duplex (see Fig. @). In this case, the analog of g(F) 
is the electrostatic energy gained by the single strand passing through the pore, reduced by any entropic penalty the 
other single strand must pay due to confinement by parts of the pore or the adjoining walls [B7J. Continuum models 
such as Eq. (0) are also commonly used to describe a number of other systems; in several of them, there is a natural 
analog to the pulling force F c . Examples include the adsorption of a Gaussian random heteropolymer, where F c maps 
directly to a force pulling the end of the polymer away from the adsorbing surface |3q ] , and a flux line in a type II 
superconductor bound to a fragmented columnar defect |39J, where F c can be viewed as the magnetic field strength 
perpendicular to the defect. In addition, the Hamiltonian He + Wc,puii bears a strong resemblance to models of the 
wetting transition in two dimensions in a wedge with angle close to 180 degrees p0| . 

III. STATISTICAL MECHANICS OF HOMOPOLYMER UNZIPPING 

Before tackling the more difficult problem of unzipping a double-stranded molecule with a random base sequence, 
we describe some results for a uniform sequence |1[ . If the energy cost of opening each successive base pair is the same, 
then the deviation rj(n) from the average vanishes identically, and £ (to) = /to. Even if, as would be the case for an 
alternating base sequence, rj(n) is a non-zero periodic function, we expect that on scales longer than its period, r](n) 
can safely be set to zero. In this section, we show explicitly that the semi-microscopic continuum model discussed 
above [Eqs. (0) and <Mj] gives results identical to those following from the simpler single degree of freedom description. 

Equilibrium statistical mechanics in a linear potential is straightforward. The partition function of our mini- 
mal model is simply Z — L dmexp(—mf /k^T) = k^T/f, and the probability of opening exactly m bases is 
if/hsT) exp(— mf/k^T). The equilibrium moments of to can be obtained from derivatives of the free energy 



G(f) = -k B T\nZ with respect to /: (to) = dG/df = k B T/f, (m 2 ) - (m) 2 = d 2 G/df 2 = k B T/f 2 , and so on. 
Recalling that / ~ F c — F, we see that (m) exhibits a power law divergence near the unzipping transition; 

(m) — (F c - Fy 1 (homopolymer). (17) 

The divergence of (to) has a simple origin: Although the absolute minimum of £(m) remains at m — everywhere 
below the transition, the system explores all configurations with £(m) < k B T, or equivalently m < k B T ' j '/; this of 
course suggests the same scaling for (m) found in the exact calculation. The homopolymer thus opens partially for 
F < F c entirely in order to gain entropy. We shall see in subsequent sections that a very different physical mechanism 
dominates in the unzipping of heteropolymers. 

A. Connection to non-Hermitian Derealization 

A different perspective on the mechanical denaturation of a homopolymer follows from viewing the energy 7^c + 
Wc,puii of the continuum model [Eqs. (|2|) and (gj)] as an imaginary time quantum mechanical action. The partition 
function Z(R, N) of a strand of length N, subject to the constraint R-(-ZV) = R, satisfies the partial differential 



equation 31 



dZ b 2 ( FV v(r; 
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v * + terJ Z -W Z "- £(F)Z ' (18) 



where the sequence-dependent function Vat(R) is replaced by the TV-independent potential V(R) for a homopolymer. 
In order to avoid a proliferation of factors of a/6, we assume that the backbone distance a between chemical monomers 
is equal to the Kuhn length 6. When F = 0, Eq. ( |l8| ) is just an imaginary-time Schrodinger equation. With the addition 
of a nonzero pulling force F, the strict correspondence with conventional quantum mechanics is lost. Nonetheless, 
much can be learned by studying the evolution operator £ using the language of quantum mechanics. This avenue 
as been pursued for the formally identical problem of a flux line pinned to a defect in a type II superconductor [|41j . 
In this subsection, we show explicitly that results from this more microscopic approach can be recovered from the 
simplified model embodied in Eqs. (|13| ) through (|l5|). 

In analyzing Eq. (jig), it is useful to view the force F as a constant, imaginary vector potential. The "gauge 
transformation" operator 

U : V(R-) >-► exp(F • R/fc B T)V(R) (19) 

can thus be used to relate the operator £(F) at a force F to the Hermitian operator £(0): 

£(F) = W£(0)tr 1 ; 

Under the same transformation, the eigenfunctions ip^(R) of £(F) are given by 

<£(R) = Z#°(R) = e F - r / fcBT ^(R) . (21) 



Eq. (21) shows that exerting a non-zero force F biases the eigenfunctions in the direction of the force. This trans- 
formation is valid as long as the new eigenfunction ip^ satisfies the same boundary conditions as the untransformed 
eigenfunction. If we think of an isolated polymer in a box whose size tends towards infinity, the appropriate boundary 
conditions are that ip% be well behaved at infinity; given the form of U, this is equivalent to demanding that the 
eigenfunction ifj^CR.) of the Hermitian problem decay at least at as fast as exp(—FR/k B T) for large R= |R|. When 
this condition holds for the n th eigenfunction, the corresponding eigenvalues of £(0) and £(F) will be identical, and 
the eigenfunctions will be related according to Eq. (pi]). Because, according to Eq. ( |18| ) the contribution of each 
eigenvalue A„ to the partition function decays like exp(— X n N), the smallest eigenvalue Ao dominates in the limit of a 
very long polymer duplex. We are interested in conditions in which the dsDNA is stable in the absence of a pulling 
force; in this case, £(0), which describes the native, unpulled polymer, must have at least one bound state. The 
ground state eigenvalue Ao < differs from the free energy per length g Q of dsDNA introduced previously only by a 
factor of k B T: go = k B TXo. Because V(R) is a short-ranged potential, the ground state wavefunction %1>q(R) should 
decay like exp(— kqR) for large R, with the decay rate given by 




where d is the spatial dimension. When applied to the ground state wavefunction, the gauge transformation of Eq. 
thus breaks down at a force of magnitude F c given by 



"' -w«-¥i/^- <*» 



k B T b V k B T 

It is natural to regard this force as the location of the unzipping transition. Indeed, one can show Jflfl that far 
from the ends of a long polymer, the probability that a given base pair will be separated by a displacement R is 
P 00 (R) = iI)q(II)iI)q (R). For F < F c , the two gauge transformations cancel each other, and Poo(R) = [-)/>o(R)] 2 . 
Thus, below F c paired bases in the bulk of the dsDNA always stay near each other, and the polymer is below 
the unzipping transition (see Fig. |6J). Conversely, above F c , where the gauge transformation is no longer valid, the 
eigenf unctions V'o are dominated by F and are extended. Indeed, one can demonstrate that they become plane 
waves as R — > oo. The two single strands are then typically widely separated (Fig. ||), and the DNA is above an 
unzipping transition given by Eq. (Eq). 

Upon inserting the expression for g(F) [Eq. (|6j)] appropriate for the Gaussian single-stranded polymer into our 
previous criterion 2g(F c ) = go, we obtain a value for the critical unzipping force F c identical to Eq. E3. In fact, 
provided the duplex binding potential V(R) vanishes as R — ► oo, ^(R) will approach a nonzero constant for large 
R above F c . One can then read off Ao = —b 2 F 2 /d(kBT) 2 directly from Eq. (jig); the free energy above the transition 
is simply fce^Ao = 2g(F), a natural result given that above the unzipping transition the DNA is entirely in the 
single-stranded form. Within the present formalism, one can also obtain a closed- form expression for Ao, and hence 
for the free energy per monomer, below the unzipping transition. For F < F c , the transformation (|2l| ) is valid, and 
A = —b 2 KQ(T)/d, independent of F. Both the entropy, given by a derivative of k B TX with respect to T, and the 
average extension per nucleotide in the bulk, given by a derivative of k B TX a with respect to F, change discontinuously 
at _F C (see Fig. |5j). The bulk unzipping transition is thus first order, as is the case for the related problem of a single 
flux line torn away from a columnar defect in a type II superconductor Jflf . 

Because k B TXo is the bulk free energy per monomer, its derivatives tell us nothing about the diverging surface 
precursors to the unzipping transition. To study surface effects within the quantum mechanical formalism, note that 
the probability that the ends of the two single strands are separated by a displacement r = R(0) is given by Q 



P (r) = i%(r) ~ exp 



k B T 



«oF 



(24) 



where the last equality is valid outside the range of the potential V(R). Focusing, for simplicity, on the case of one 
spatial dimension [d = 1), and replacing the vectors R and r by the scalars R and r, it follows that the average 
distance (r) between the ends of the two single strands diverges like 

, . (wo - F/k B T)- 2 + (wg + F/k B T)- 2 1 

[r) (k - F/kBT)- 1 + ( KQ + F/k B T)-i ~F C -F- [ ' 

Slightly more involved calculations [|41j give the decay of the end to end distance as the bulk value is approached: 

(12(n)> = <r>exp(-£) , (26) 

where (R(n)) is the average distance between the two single strands at base pair n. The healing length n* diverges 
near F c as 



1 



b 2 {F c z ~F 2 ) F c -F 



(27) 



To check these results against the single degree of freedom model defined by Eqs. (|13|)-(|15D, one must translate the 
number of unzipped base pairs m into a distance r between the ends of the two single strands. When m base pairs 
have been unzipped, r is simply the end to end distance of a Gaussian polymer of length 2m subject to a force F; it 
thus has distribution J35[ 

r, / , s 1 f [r-2mb 2 F/k B T} 2 \ 



The probability that precisely to base pairs have been unzipped is P(m) = (f/ksT) exp(— mf/ksT), so the full 
distribution of r is given by 

P (r)= dm P(m)P Q {r\m) . (29) 

Jo 

Evaluating this integral leads to the prediction summarized in Eq. (|24|). Similarly, the distribution P n (R) of R{n) for 
any n can be obtained by summing over a conditional distribution, assuming that to > n bases are open, and another 
one given that to < n bases are open. The latter distribution is well-approximated, except for n very near to, by the 
bulk distribution for dsDNA P oc [R(n)] introduced earlier. Thus, we find that 



/>oo pn 

P n (R) = dn P(m)P (R\m - n) + P^R) dmP{m) 

Jn JO 
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nf 
1 — exp — 



fc R T 



Poo(R) , (30) 



where Po(R\m — n) and Pq(R) are given by Eqs. ( |2S| ) and (|29|). Since -Poo(-R) must be symmetric with respect to R = 0, 
its average vanishes. Upon using Eq. (BQ) to evaluate (R(n)) and recalling that / = \go\ — 2g(F) = (F c 2 — P 2 )& 2 /(fceT) 2 
for a Gaussian chain, we recover Eqs. (p6|) and (|27|). Thus, the predictions obtained by studying directly the evolution 
equation ( jig ) of the partition function coincide with those obtained by integrating out most degrees of freedom to 
arrive at a simplified formulation in terms of the unzipping energy £{m). 

IV. DISORDER-AVERAGED BEHAVIOR 

In contrast to the entropically-driven opening of a homopolymer, the unzipping of a polymer with a random 
sequence is driven primarily by the possibility of lowering £(m) by unzipping a string of base pairs that are more 
weakly paired than the average. The two transitions are thus qualitatively different. To see this explicitly, consider a 
simple application of the Harris criterion for the importance of disorder §42] . The typical variation per monomer due 
to disorder in the base-pairing energy £(m) of a liberated section of length (to) is (A/ (to)) 1 / 2 ~ y/F c — F, where the 
F dependence follows from the result (fi~7]) for the divergence of (to) near the transition for a homopolymer. These 
energy variations vanish more slowly as F — ► F c than the average energy difference / ~ F c — F between the two 
phases, indicating that sequence randomness dominates at the unzipping transition. 

A related argument can help us to guess the correct critical exponent for the divergence of (m) when disorder 
is present: The contribution to £{m) of the average energy difference is m/, while a typical favorable contribution 
from random variations about the average is of order — yAro- The random part thus exceeds the average for to < 
to* = A// 2 . When this is the case, £ (m) is roughly as likely to be negative as to be positive. One thus expects 
that a typical value of (to) will be at least of order to*. Near enough to the unzipping transition at / = 0, m* 
is larger than the equilibrium average k^T j f for a non-random sequence. Instead of the 1/(F C — F) divergence in 
(to) seen for a homopolymer, one might thus expect DNA with a random sequence to show a considerably stronger 
1/(F C — F) 2 singularity. The crossover between the two scaling regimes should occur when A// 2 ~ k^T/f, or when 
/ ~ A/fceT. For dsDNA, both vA and the average base pairing energy go are of order k^T; we can estimate 
/ ~ g'(Fc)(F c - F) « (g /F c )(F c - F). Hence, when fk B T/A - 0(1) at the crossover, the reduced force (F c - F)/F c 
is also 0(1), confirming that disorder cannot be neglected in polynucleotide unzipping even for F of order, say, F c /2. 



As we shall see in Section VII, disorder affects the dynamics of unzipping for a similar range above F c . 

This scaling argument can be extended to the case of random DNA sequences with long-ranged correlations (as 
may be the case for noncoding DNA || ) . If the correlations between nucleotides separated by to base pairs decay like 
1/to 7 , the fluctuation rj(m) around the average energy to open a base pair will likewise have a correlation function 
rj(rn)n(m') ~ 1/|to — to'| 7 . For 7 < 1, the mean-squared value of L dm'r](m') then grows like L dm' L dm"l/\m! — 
to"| 7 ~ to 2-7 . A typical random contribution to £(m) then increases as to 1-7 ' 2 ; balancing this random energy 
against to/ suggests that (to) ~ to* ~ f~ 2 ^ ■ If we take, for example, 7 = 2/3 ||, then (to) ~ l// 3 , an even stronger 
divergence. 

To verify our scaling argument for the case of a random, uncorrelated base sequence, we begin by calculating the 
disorder-averaged number of bases opened (to) (as before, the overbar indicates an average over different random base 
sequences). Fluctuations about this average will be studied in more detail in the next section. To find (to), one must 
first compute the average free energy — fceTlnZ; disorder-averaged cumulants of to can then be obtained by taking 
derivatives with respect to /. Remarkably, the entire distribution of Z can be found exactly by treating the random 



energy r/ as a Langevin noise. Several variations on this procedure have appeared in other physical contexts 
have related approaches to the same formal problem J44J . 

We begin by defining the partition function of a polymer of finite length to: 



Z(m) 



dm exp 



£(m') 



k B T 



(31) 



The partition function Z of interest to us is recovered by taking the limit of an infinite length polymer: Z = 
linim^oo Z(m). The derivative of Z is simply 



dZ 
dm 



-£(m)/fc B T 



with initial condition Z(0) = 0. Similarly, the derivative of £(m) is, from Eq. ( |13| 

d£ 



dm 



= f + rj(m) 



(32) 



(33) 



with initial condition £ (0) = 0. Eqs. (p2) and (133) make up a system of coupled Langevin equations, analogous, for 
example, to those describing the Brownian motion of a massive particle, with £ playing the role of momentum and 
Z that of position. They can be transformed in the usual manner into an equivalent Fokker-Planck equation for the 
joint probability distribution P(£,Z,m) of £ and Z at "time" m |4q|: 



dP 

dm 



2 d£ 2 J d£ dZ 



(34) 



To solve Eq. (p4f) in the limit of large m, we first Laplace transform with respect to Z and to to, with conjugate 
variables A and s, respectively. The resulting ordinary differential equation for the transformed distribution P(£; A, s) 
takes the form 



££-'§-*-"" *->~™ 



The change of variables 



8A 



1/2 



.--JfeBTl^] ,-?M-> 



leads to an inhomogeneous Bessel equation 

,d 2 P A Afk B T\ dP 



dx 2 



1 



A 



dx 



2 , HkBTy 

Jj i A 



4x k B T 

P = 0[X - Xq) 



(35) 



(36) 



(37) 



where xq = x\s = o = fceT^/SA/A. Although £ has been replaced by x, P remains normalized as a function oi £. One 
can easily check that the solution of Eq. (|3^) follows the usual form for the Green's function of a Sturm-Liouville 
equation: 



P{x;\s)=k B Tl J^a/A-,,/ a 



i(^) 2 [ kBT J A K u (xo)I„(x), x<x 
I„(xo)K u (x), x>x 0l 



where I v and K v are modified Bessel functions, and 






(38) 



(39) 



Eq. ( p8[ ) represents an exact solution to our single degree of freedom model. We are interested primarily in the 
distribution of Z for large to, so we would like to integrate over all £ and then take the limit to — > oo. The first task 
can easily be accomplished on a formal level: 
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P(A,s) = / d£P(£,X;s) 

J — OO 

8(fc B T) 2 \ 



dx /X \ 2 / fc BT/A 



>fl(?) 



/i/(x) + 



«»>/.* (t) 



dc /a;o\ 2 / fcBT / A 



# tf (a0 



(40) 



Because £(m) grows linearly with m below the unzipping transition for large enough to, the contributions to the 
partition function Z of the parts of the dsDNA at very large m are exponentially suppressed. Hence, we expect that 
Z must have a well-defined limiting distribution as to — > oo. This, in turn, implies that the Laplace transform P(A, s) 
should diverge like 1/s as s — ► 0, or equivalently, as v — > 2fk B T/A. An examination of Eq. (EO) reveals that this is in 
fact the case. Specifically, I u (x) ~ a;" for small x, so the integral from to xq diverges when v approaches 2fk B T/A. 
This singularity dominates the large to behavior of the inverse Laplace transform with respect to s, allowing us to 
perform the inversion analytically: 



P(A; 



r(2/fc B T/A) 



2\{k B T) 2 



fk B T/A 



K. 



2/fe B T/A 




(41) 



where we have substituted xq — k B TyJ%\/ A. Note that the asymptotics are completely determined by the small 
x behavior of P(x; A, s). Because small x corresponds to large £, this is quite reasonable: It follows directly from 
Eq. (33) that the distribution of £ (m) is a Gaussian centered at mf, so only very large £ will have any weight for 
large m. 

To evaluate the disorder-averaged free energy, we must invert the Laplace transform P(A; m — > oo) to obtain the 
distribution P(Z) of the partition function. With the aid of various Bessel function identities, one discovers that the 
integral can be evaluated analytically. The result is the distribution over possible random sequences of the partition 
function Z of our minimal unzipping model J43j : 



P(Z) 



1 


[2(fc B P) 2 l 

A 


T(2fk B T/A) 



2/fc B T/A , , l+2/fc B T/A 



exp 



-2(fc B r) 2 

ZA 



(42) 



The disorder-averaged free energy follows immediately by integration; with the substitution y = 2(fcBP) 2 /(-^A), one 
has 



k^Tln Z = k^T 



1 



/"OO 

/ dyy 2fk * T l A - l ]n{y)e-y + hi 
Jo 



T(2fk B T/A)J 
Taking a derivative with respect to / yields the main quantity of interest: 



2(fc B T) 2 



(43) 



= -fc B T 



dlnZ 

2(k B T) 2 



T(2fk B T/A)A 



dyy 



2fk B T/A- 



H^y) 2 



2(k B T) 2 T'(2fk B T/A) 2 
Y{2fk B T/AfA 



(44) 



where T'(z) = dT/dz. This function is plotted in Figure (J7|). In agreement with our earlier scaling argument, there 
is a crossover from 1// to l// 2 behavior at / of order A/k B T. Indeed, one can analytically extract the asymptotic 
small / behavior from Eq. ([44)). One finds that to leading order as / — > 0, 



A 
2p 



(F c - Ff 



(random heteropolymer) . 



(45) 



Additional results follow for the higher cumulants of m. 
found from the second derivative of h\Z. For small /, (m 2 
variance is a length scale that can be compared to { 



For example, the disorder-averaged variance of m can be 

2 



n*) - (m)' = k B TdlnZ"/df 2 ~ l// 3 . The square root of this 
In the non-random case, both quantities are of order k B T / f . 
In contrast, once sequence randomness is added, we have that ((to 2 ) — (m) 2 ) 1 / 2 ~ l// 3 / 2 , which is much smaller than 
(to) for sufficiently small /. Thermal fluctuations about (to) in a given random heteropolymer thus become small 
compared to the mean near the transition. As we shall see in the next section, this fact allows us to predict not just 
disorder- averaged quantities (it might be tedious to average over all possible sequences in a real experiment!), but 
also the unzipping behavior of a single dsDNA molecule. 
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V. FORCE-DISPLACEMENT CURVE FOR A SINGLE POLYNUCLEOTIDE DUPLEX 

Figure @ plots the average number of unzipped bases (to) versus force near the unzipping transition for simulations 



of four different dsDNA molecules, with different random sequences 46 . The corresponding energy landscapes for a 
force close to F c are shown in Fig. M Far from being smooth, each (to) versus / curve shows long plateaus, where 
(to) remains essentially constant, separated by sudden, large jumps. The smoothly diverging precursor to the phase 
transition seen in homopolymers and in the disorder-average (m) has evidently been replaced by a series of "micro- 
first-order transitions." The four traces, moreover, are not the same — the unzipping of a single random dsDNA does 
not exhibit self-averaging, but instead shows large sequence-dependent variations. Most equilibrium systems with 
quenched disorder are self-averaging because the macroscopic observables of interest are the sums of contributions 
from many essentially independent correlation volumes, each with their own independent realization of the quenched 
random variables; the central limit theorem then guarantees that in the thermodynamic limit, measurements will 
always coincide with the disorder average. In a single molecule DNA unzipping experiment, in contrast, one is 
probing only one realization of the quenched random sequence. As Figure |9| indicates, each random realization of 
£{m) will be different, and the value of (to) at a given / can thus be expected to differ from one polymer to the 
next. Furthermore, for each sequence, £(m) varies over many tens of ksT; one thus might expect that to would not 
fluctuate very far from the minima. Figure H bears out this idea: The location TO m ; n of the absolute minimum of 
£{m) for each value of / coincides remarkably well with (to). Because £(m) is usually negative at these minima, the 
dsDNA gains energy by unzipping some bases at its end, even below the bulk unzipping transition. This mechanism 
contrasts with the essentially entropic impetus for surface opening in the case of a homopolymer. We show in this 
section that, near enough to the transition, (to) for a given DNA or RNA duplex coincides with TO m i n with arbitrary 
precision and that this fact can be used to gain a quantitative understanding of the abrupt jumps seen in Figure g. We 
will usually work in the continuum approximation, with the probability P(£, to) of finding an energy £ after opening 
to bases satisfying a diffusion-like equation, 

OP Ad 2 P „dP 



-/«F- (46) 



dm 2 d£ 2 J d£ 

This result follows directly from Eq. ([53|) or from integrating the full Fokker-Planck equation ( p4| ) with respect to Z. 
At each to, £ (to) thus has a Gaussian distribution; because our results do not depend on the tails of this distribution, 
they should be equally valid for more realistic, discrete models of dsDNA. 

A. Dominance of the Absolute Free Energy Minimum 

We begin by arguing that, close to the transition, the location m m i„ of the absolute minimum of £(m) is in fact the 
same as (to). More precisely, we wish to show that, for a random DNA sequence, 

\TTi) 

lim — — — = 1 with probability 1. (47) 

/->0 TO min 

In qualitative terms, one might expect this result to hold because the scale of £ (m) grows like the square root of the 
distance from the minimum; it is thus very unlikely that £ (m) will revisit the neighborhood of its minimum value 
for to far from the location of the original minimum. Here, we simply outline the arguments necessary to support 
this intuition; closely related theorems have, however, been proven with mathematical rigor P7J , We will proceed by 
first considering scenarios in which Eq. ( |47| ) would not hold, then showing that the probability of each such event 
vanishes as / — > 0. In renormalization-group language, Eq. (|47| ) can be read as stating that the unzipping transition 
for a random dsDNA sequence is gov erned by a zero-temperature fixed-point; such fixed points have been found in a 
number of other random systems |4q| . 

The simplest way that TO m ; n and (to) could differ is for TO m i n to equal 0; since (to) is necessarily positive, their ratio 
would then be infinite. The probability that TOmin = is the same as the probability that the biased random walk 
£(m), which starts at £(0) = 0, has £ (to) > for all to > 0. More generally, the probability that £ (to) > for all 
in > for a random walk starting at £(0) = £o is known in the literature on first passage problems as the "splitting 
probability" i:(£q). The splitting probability satisfies an equation involving the adjoint of the diffusion operator p5| , 

Ad 2 ir dir , s 

Y^r%r°- (48) 
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The solution of this equation with boundary conditions 7r(0) = and 7r(oo) = 1 is tt(£q) = 1 — exp(— 2£of/A). The 
requirement that 7r(0) = is an artifact of the behavior of a continuous time random walk as to — > 0: Because £(m) 
experiences small jumps up and down on all scales, a random walk that starts at £ (0) = will pass below the line 
£ = many times for very small m. This behavior is not relevant to real DNA with discrete bases, and we can 
regularize it by considering, instead of a random walk that starts exactly at £q — 0, one that starts slightly above 
0. For small £q, tt(£o) tss 2£o//A, so the splitting probability vanishes linearly as / — > 0. Indeed, for any £q, tt(£o) 
goes to zero linearly for small /, as one might expect based on the well-known result that a completely unbiased 
random walk in one dimension must eventually visit the entire real line. The same linear behavior for small enough 



bias is seen in random walks on one-dimensional lattices 45 . We thus conclude that the probability that m min = 
is proportional to / and can be neglected as / — > 0. 

Now consider other possible values of m m ; n . We shall see in the next subsection that the distribution of m mm for 
w-min > is a function of the dimensionless ratio m m i n f 2 / A. The probability that m mm ~ 0(1/ fP), with (3 =^ 2, 
hence becomes negligible for small /, and we need only consider m m i n ~ Oil/ f 2 ). For the absolute minimum and 
the thermal average not to coincide in this case, there must be a local minimum nearly degenerate with £ (m m ; n ) a 
distance 0(1/ f 2 ) away from m m i n . Note in particular that a degenerate minimum closer to m m i n than 0(1/ f 2 ) will 
contribute an additive correction to (to) that is much smaller than m m ; n ~ 0(1/ f 2 ) for small enough /, and thus will 
not affect the ratio (ro)/m m i n as / — > 0. The same holds true for thermal fluctuations in the well surrounding m m i n - 

We can rephrase the question of the existence of degenerate minima as follows: What is the probability that, for a 
given positive E and e, 

£(m) > £(m m in) + E for all m such that \m — m m i n | > eA// 2 ? (49) 

If this inequality is satisfied, then (to) /m m ; n — 1 is at most of the sum of a term of order e and of a term of order 
exp(—E/kBT); if for any choice of E and e the probability that it is satisfied can be made arbitrarily close to 1 
for / small enough, then Eq. ( p7| ) must hold. One can easily argue from dimensional analysis that this is the case: 
The probability Pi ncq . that the inequality Eq. (^9|) holds is a function of the dimensionless parameter e and of the 
three parameters E, A, and /, with dimensions, respectively, of energy, (energy) 2 /nucleotide, and energy/nucleotide. 
Because Pineq. is itself dimensionless, it must depend only on dimensionless ratios of the latter three parameters; 
by rescaling energies and nucleotide numbers, one can easily conclude that the only such ratio is Ef /A. Hence, 
Pineq. = Pineq. (e, E f / A) . Moreover, we know that TO m i n is the absolute minimum of the random walk, so it must be 
true that -Pineq. — 1 when E — 0. As long as Pi neq .(e, Ef /A) is a continuous function if its second argument, it must 
then be true that Pi neq . — ► 1 as / — ► for any fixed e and E. This is sufficient to confirm that rn m [ n and (to) coincide 
with probability 1 for small /. If P; ne q. has a well-defined first derivative, then 1 — -Pincq. ~ Ef /A for small /, a result 
that can be verified by a more detailed calculation. 

This linear dependence has a simple interpretation: For an unbiased random walk, the probability to make a first 
return to the starting point after to steps decays like 1/m 3 / 2 ; this is also approximately the case for a biased random 
walk on scales smaller than ~ A// 2 . Upon integrating 1/to 3 / 2 from eA// 2 to some large upper bound, we see that 
the probability not to return at all (and thus not to have any minima nearly degenerate with m m i n ) differs from 1 by 

a number of order /. Our earlier observation that (m 2 ) — (to) ~ goes like I// 3 can also be explained by the small / 
behavior of I — -p n0 q. [H : The disorder average is dominated by the probability of order / that (to 2 ) — (to) will be 
of order l// 4 . The notion that disorder-averages of higher cumulants can be determined by rare configurations of the 



disorder in which there are two widely-separated minima has been explored in several other random systems §18 49 



B. Statistics of Minima: Plateaus and Jumps 

Having determined that the absolute minimum TO m i n of £(m) and the average number (m) of bases opened coincide 
near the unzipping transition, we can now use this fact to study the (to) versus / curve for a single random sequence. 
Consider the effect on the energy landscape £(m) describing a given dsDNA molecule, with a given random sequence, 
of tuning the bias / towards zero. Decreasing / gradually tilts the energy landscape towards the horizontal, as 
illustrated in Figure [fOL The location of the absolute minimum will then remain constant over a range of /, giving 
rise to the observed plateaus. As the landscape tilts, however, local minima at larger values of m move downwards 
faster than those at smaller to. At certain specific values of /, the energy of a minimum at to > TO m i n will move below 
£(m-min), and m m i n ~ (to) will shift from the old minimum to the new one. As Figure |l0| shows, the two minima can 
be separated by a considerable distance, thus giving a physical explanation for the dramatic jumps seen in Figure H. 

To develop a quantitative theory of these effects, we begin by calculating the distribution of m m i n for a given /, 
then consider the conditional probability that TO m in = mi when / = fo, given that the minimum was at toi at a bias 
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/i. This conditional distribution will allow us to make predictions, for example, about the typical sizes of plateaus 
and of jumps. 

We first ask for the probability -P m in(TO m i n , £ m i n ) that £(m) has its absolute minimum at (m m i n , £ m i n ), or equivalently 
the probability that £(m) first reaches £ m i n at "time" ™ m i n , multiplied by the probability that £(m) > £ m ; n for 
to > 7n m i n . The latter is simply the splitting probability -k introduced in the last subsection. Although in the 
continuum approximation it(£q) is singular as £o — > £ m i n , we can regularize it in a manner similar to that used 
previously. Because 7r is just a constant factor, independent of £ m i n , the details of the regularization are unimportant. 
In practice, it can be determined by demanding that -P m in(£min, Wmin) be correctly normalized. 

More interesting is the probability of first passage to £ m i n . We first define the probability S(£,m;£ m [ n ) that, 
starting from £ — at m — 0, the random walk has arrived at energy £ after opening to bases, without ever having 
had £(m) < £ m { n . It turns out that S satisfies the same Fokker-Planck equation (E6|) as the probability P(£,m) for 
the unconstrained random walk to arrive at (to, £) [49]. The constrained probability S, however, is also subject to the 
boundary condition S^fmin, to; £ m i n ) — 0. With this boundary condition, one can solve the Fokker-Planck equation 
to find 
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The probability to first cross £ m i n after m m [ n steps is then given by (A/2)dS/d£\ £ _c . , i.e. the diffusive flux of 
random walkers crossing £ m i n for the first time. The distribution P m in(£min! m min) differs from this function only 
by a normalization factor. Finally, we determine the probability that the minimum occurs at TO m i n for any £ m i n by 
integrating from — oo to with respect to £ m i n - The final result is 
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in agreement with the distribution obtained by le Doussal et al. using a real space renormalization group f49|| . Note 
from Fig. |l l| that P m i n ('7imin) agrees (to within counting errors) with the distribution of (to) obtained from simulations. 
As claimed above, P{m m _i- a ) takes the form of a scaling function of TO m ; n / 2 /A. Variations in TO m i n m (to) between 
different random sequences are thus of the same order as the average (to), and the system is not self-averaging. 

We now turn our attention to the more interesting and experimentally relevant question of correlations within a 
single (m) versus / curve. In particular, we would like to know the probability that £ (to) has its minimum at 777,2 at 
a bias fi given that, for the same realization 77(777) of the random base sequence, the minimum was at toi at a bias 
/1 > fi- This probability will turn out to depend only on the jump size TOj ump = 777-2 ~ Toi. The plateaus seen in 
Fig. @ suggest a delta function contribution at TOj ump = 0. To determine the strength of this delta function, consider 
a polymer with a fixed base sequence giving rise to an energy landscape 
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If the minimum of £ (to) is at 7771 for bias /1, then W(m) + f\m > W{m\) + /1TO1 = £\ for all to, and hence 
W(m) + /2TO > W(rn\) + /27771 = £2 for m < mi and fi < f\. If the minimum is to move from Toi as the bias is 
tuned down to /2, it must move towards larger 777. This is not surprising — one can easily prove that d(m)/df < 0. 

Let 111 and II2 denote the events, respectively, that for 771 > mi, W(m) + f\m > £\ and W(m) + /2TO > £2- The 
probabilities that IT and II2 occur are simply the splitting probabilities w± oc /1 and 1^2 oc J2- If the minimum of the 
random walk falls at mi for a bias /1, then II2 is true if and only if the minimum remains at mi at the bias fi. In 
other words, the coefficient of the delta function at TOj ump = in the distribution of 7nj ump is simply the conditional 
probability Prob[Il2|IIi]. From Bayes' theorem |52l, we know that the probability that events ni and n2 both occur 
for the same random sequence is Prob[Tl2 Alii] = Prob[n2|IIi]Prob[ni]. But if II2 occurs, then 111 must also occur — if 
the random walk never passes below its value at 777,1 with the smaller bias f%, then it can never do so with the larger 
bias /1. Thus, Prob[Bi2 A 111] = Probfry. The conditional probability thus takes the simple form 
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and the probability that a plateau stretches from f\ down to J2 is just J2I j\- Upon taking a derivative with respect 
to /2, we conclude that the end point of a plateau that starts at a bias /start is uniformly distributed between and 
/start- Equivalently, the log ratio I = ln(/ sta rt//stop) of the starting and ending biases of a plateau is distributed as 
exp(— I). 
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The distribution of plateau lengths, of course, is only part of the description of a plot of (m) versus /; to complete the 
characterization, we must also study the distribution -Pjump of jumps TOj ump for non-zero mj ump . The full distribution 
of TOj ump will then take the form (f 2 /fi)S(m iunip ) + (1 — /2//i)-Fjump("ijump)- The calculation of Pjump requires an 
extension of our previous first passage approach. As before, we are interested in the probability that the biased 
random walk W(m) + jim first reaches the energy £ 2 at m^ — m\ + mj ump , but subject now to the additional 
constraint that (mi, £\) is the absolute minimum for the larger bias f\. Hence, we demand that W(m) + f\m > £\ 
for all to > 777,1, where W(m) is the same fixed realization of the random energy landscape. To calculate this modified 
first passage probability, note that for each 777, only one of the two conditions has to be taken into account. For 
Wjump < (£1 — £2)/{fi — J2), W(m) + fim > £\ is the stronger constraint on the allowed value of £(m), while for 
"ijump > {£1 — £2)l{!\ — J2), W(m) + /2m > £ 2 is the stronger. We can find the first passage probability, subject 
to both constraints, by multiplying the probability of arriving at 777j ump = [£\ — £a)/(/i — f%) subject to the first 
constraint by the probability of going from there to (777-2, £2) subject to the second. Specifically, let Si(£ ,m;£ m i n ) 
be the probability of arriving at £ after opening 777 bases, with bias /1 and with £(m) always larger than £ m i n , and 
let S2(£ ,m;£ m i n ) be the corresponding probability with bias f 2 . Both probabilities are given by Eq. (p0[), with the 
appropriate substitution for /. As in our calculation of the distribution of minima, the derivative of S is also an 
important quantity, so it is useful to define S[ 2 (£,m;£ m i n ) = dSi^/dS. The probability that a random walk with 
bias f 2 will arrive at (m,£), subject to the constraint that W(m) + fim > £1, is related to Si by a "Galilean" 
transformation (with m viewed as a time and /1 — f 2 viewed as a velocity jump). Upon making use of the invariance 
of the S"s with respect to uniform translations in £ and in 777, one can thus write the probability that (m 2 ,£ 2 ) is the 
minimum at bias f 2 , given that (mi , £\ ) is the minimum at bias /1 , as 

/>oo 

-Fjum P (£2,m 2 |£i,mi) ex / <£' Si [£' - £1 + (/1 - f 2 ) {m + mi), m'; 0}S' 2 (£ 2 -£', m 2 - m x - m';£ 2 - £') , (54) 
J £ 2 

where 777' = {£\ — £2)/{fi — J2) is the value of mj um p = 7772 — mi at which the two constraints switch precedence. 
The quantity S\[£' — £\ + (/1 — f 2 )m\m'; 0], which is formally zero, is assumed to be regularized by replacing 
by — e, and we have suppressed the normalization factor proportional to ir 2 . According to Eq. (p4), -Pjump depends 
only on the two biases /1 and f 2 and on the differences mj um p and £j U mp = £2 — £\ + (/1 — f-i)fn\. The latter is 
the difference between £(m\) and £(m 2 ), both defined with bias f 2 ; the extra factor proportional to mi is necessary 
because £\ is defined with bias f\. It is straightforward to show that the conditional distributions of minima are 
Markovian — that is, the distribution of 7772 and £ 2 does not depend on the location of the absolute minimum for any 
/ < f\\ Suppose that one were to ask for the distribution of 7772 and £ 2 subject not only to the constraint that at 
bias /1, the minimum was at (rni,£i), but also that at a bias /o < /1, the minimum was at 7770 < Tn\ and £q, with 
£1 + C/b — fi) m o < £0 < £1 + (/o — /i)ffli. This additional demand translates into the condition that W(m) + fom > £q 
for m > 777i. This constraint, however, is weaker than the requirement W(m) + f\m > £\ imposed by the location 
of the minimum at the bias f\. The distribution of (7772, £2) is thus independent of what happens at /o, and the 
probability of a given sequence of measurements of (777) = m m ; n for successive values of / can be expressed as a 
product of factors of Pjump- 

To find the distribution of mj um p alone, and thus of m2, one must integrate Pj U mp(mjumpj£jump) with respect to 
£jum P from — (/1 — f 2 )m 2 to 0. The lower bound reflects the constraint that W(m2) + /1TO2 > £1] the upper bound 
ensures that £ 2 < £(mi). Figure U2 compares a numerical calculation of the full distribution Pjump obtained in this 
way with simulation results. The good agreement confirms that (777) « 777 m i n . The figure also shows that for large 
TOjump, Pjump decays like exp(— m,j ump f 2 /2A). This is the same as the large m m i n behavior of P m in with / = f 2 ; for 
large enough mj um p, the constraint imposed by the minimum at mi has no effect on the distribution. 

Additional analytic insight can be obtained by considering various limits. When (/1 — f 2 )/f 2 ^S> 1, one finds that 
Pjump(m,j U mp,£jump) — Pmin(mj U mp, £jump) , where P m i n is the distribution of the absolute minimum at a given value of 



/ discussed above [Eq. (51)], evaluated with / = f 2 . In the limit of large f\ — f 2 , the lower bound on £j um p approaches 
—00, and the integral of Pjump with respect to £j um p introduces no extra complications. The distribution of mj um p is 
thus no different from that of the minimum m m ; n without any additional constraints. After normalization, we find 

Pjump(mjump) = Pmiu(mjump) = -^e — /^A J°° ^-vrn^f'/ZA _V^_ (^fl_h y> ^ (g5) 

Eq. (|55|) can be understood as follows: When f 2 is much smaller than /1, the smaller bias allows the system to visit 
much more random sequence before the /m term in £(m) makes the energy cost prohibitive. With so many more 
places where the new absolute minimum could occur, the constraint from the location of the old minimum at the 
larger bias becomes unimportant, and Pj U mp(mj U mp) becomes independent of mi. Indeed, because m.2 ~ 1//| is 
typically much larger than mi ~ l//i , 7772 differs very little from mj um p- The distribution of 7712 thus approaches 
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-Pmin(w2). That is, the minimum at bias ji is essentially chosen independently from the same scaling distribution 
as the minimum at bias f\. With a long enough sequence, it should in principle be feasible to make many such 
independent measurements of m m ; n s=s (to) at different values of /. Although DNA unzipping is not self- averaging 
in the usual sense, the data from even a single random sequence thus nevertheless contain remnants of the disorder- 
averaged behavior. In particular, if mm is plotted versus In/ for a long en oug h polymer, the best-fit line should have 
slope —2, as predicted by our calculation of the disorder- average (m) (Eq. J45|), albeit with considerable scatter about 
the line. Figure [13 illustrates this point. 

The distribution of TOj ump in the opposite limit (/i — /2V/2 <C 1 is the size distribution of jumps between two 
successive plateaus, one ending and the other starting at /1 m fa. Put in different terms, it gives the distribution of 
distances between two essentially degenerate minima at a given bias /, assuming that such minima exist. Because 
the two minima are already required to be at almost the same energy, -Pjump is independent of £j ump in this limit. 
The integral over £j U mp is then elementary, and the resulting distribution takes the form 

_ h 1 ( ra jump /| \ ( ft- h 
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P jump (m jump ) = -£=—— cxp ^^ ^— 1± « 1 . (56) 



This expression is valid for mj ump /|/A < /b/C/i — fo); for larger values of mjump the power law prefactor of Pj U m P 
crosses over from l/(mj ump ) 1 / 2 to l/(mj ump ) 3 / 2 . The tail of the distribution thus still agrees with that of P m in, as 
expected. 

Knowledge of Pjump gives a detailed description of the statistics of (to) versus / curves, under the assumption that 
(to) and TO m in coincide. We have already seen that this assumption is valid with probability 1 as / — > 0. For any 
finite /, however, there will be occasions when it does not hold. In particular, it must break down in the vicinity of 
jumps between different plateaus. Near enough to a jump, the minima giving rise to the two plateaus will be nearly 
degenerate, and (to) will contain substantial contributions from both minima. Indeed, if a jump of size TOj ump occurs at 
a bias /1, then both minima will be appreciably occupied if the difference between their energies \f — /i|TOj ump < k^T. 
The sharp discontinuity in (to) at /1 will be replaced by a smooth transition of width of order A;BP/TOj ump . We have 
already seen that TOj ump is typically of order A// 2 , so the width of a typical transition sharpens like f 2 . In contrast, 
we have seen that a typical plateau at bias /1 extends for a distance of order f\. As /1 — > 0, the width of the jumps 
thus becomes very small compared to the size of the plateaus, in agreement with our arguments that m m i n /(m) — > I 
in this limit. The sharpening of the jumps as / — ► + is evident in Figs. Hand 13. 



Note also that if the temperature is raised at fixed force near the unzipping transition (i.e. a vertical instead of a 
horizontal trajectory in the inset to Fig. fll), we have / ~ Tq — T. The surface contribution to the specific heat near 
the transition is thus Td 2 G/dT 2 ~ T 2 d\nZ/df 2 ~ d(m)/df, where G = —k B T\nZ is the "surface" free energy of 



the partially unraveled polymer duplex at fixed temperature and force defined in section III . If (to) as a function of / 
takes the form of a sequence of plateaus and jumps, then the derivative of (to) with respect to / must vanish except 
in the vicinity of the jumps, where it will show a sharp spike proportional to the jump size TOj ump . As / — + and the 
jumps become very sharp, the specific heat spikes will approach delta functions. Each jump can thus be thought of 
as a "micro-first-order transition." 

We close this subsection with an example of how plateaus and jumps can appear in the unzipping of a biologically 
relevant DNA sequence, that of phage lambda |31j. Figure [14] plots the energy landscape £ (to) of a 28 kb segment of 
the lambda genome for two different biases. The energy to open each base pair is taken from a widely-used parameter 
set |}3|, and we neglect the possibility of rare denaturation bubbles under physiological conditions. The energy 
landscape shows two pronounced minima; a third minimum very near to = is barely visible. The corresponding 
plot of (m) versus the distance / ~ F c — F from the transition, determined by an exact evaluation of the partition 
function, appears in Figure [Iq. As expected, it consists of three plateaus, corresponding to the three minima. Thus, 
the qualitative ideas developed in this section apply to real sequences found in experimental biology as well as to the 
idealized random models explored here. 

C. Application: Determination of base-pairing energies. 

In this subsection, we digress briefly from our primary focus on polynucleotides with random sequences to discuss 
how the mechanical denaturation of specially designed sequences might be used to measure the strength of the base- 
pairing and stacking interactions that stabilize polynucleotide duplexes. Traditionally, these interactions have been 
studied by analyzing the thermal melting curves of double-stranded DNA's and RNA's I33J57J]. Most commonly, the 
stability of a duplex is assumed to be determined by 10 phcnomcnological parameters giving the combined pairing 
and stacking energies of the 10 possible distinct groups of two successive base pairs. These parameters can be inferred 
from the melting temperatures of a set of duplexes with appropriately chosen sequences. Although in most ways quite 
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successful, this method has the disadvantage that it yields values the 10 energy parameters only in the vicinity of the 
melting temperatures of the double-stranded molecules. Because these energy parameters are expected to depend on 
a variety of conditions, including salt concentration, pH, and (for entropic reasons) temperature, it would be useful to 
have a technique that allowed the measurement of duplex stability in a wider range of conditions. It has already been 
shown experimentally that micromechanical experiments can be used to estimate the binding energy of a particular 
RNA hairpin Q|. Here we extend the analysis of g to consider more generally how mechanical denaturation might 
be used to infer the stability of duplexes. 

Because it is difficult to synthesize long polynucleotides with prescribed sequences, one would like to be able to 
measure pairing energies on relatively short (tens of nucleotide) hairpins. Even for short hairpins, one can still define 
an average pairing energy go, a variation about the average rj(m), a critical unzipping force F c satisfying Eq. (Jl€ 



and a distance / = 2g(F c ) — g from the transition. Drawing on the ideas developed in subsections VA and VB, 
we expect that, for a given hairpin in the constant force ensemble, (to) will remain close to minima of £{m) except 
for jumps at certain values of /. The measurement of go for a hairpin of length N is most straightforward if there 
are only two such minima, at to = and to = TV; the unzipping transition then shows two-state behavior [ p3[ . For 
this to be the case, the energy landscape £ (to) must take roughly the form shown in Fig. |l6|. Because of the energy 
barrier between m — and m = N, the unzipping fork is always localized in the vicinity of one of these minima, 
with a sharp jump between the two at F c (Fig. 17]). F c is thus easily read off from the experimental extension 
versus force curve, and go is then given by Eqs. (W and (Jlq) . Just as for the standard methods based on melting 
curves, the 10 energy parameters can be estimated from the knowledge of go for enough different hairpins. One can 
straightforwardly design hairpins with two-state unzipping behavior by joining a stretch of strongly-paired bases to 
a less stable stretch. Thus, for example, if one strand of the hairpin has sequence 5' (C) N / 2 (A.) N / 2 3' , with opening 
starting from 5' end (and complementary sequence 3'(G)^/2(T)jy/ 2 5')j .<?o f° r the hairpin approaches for large N 
the average of the energies associated with (reading along one strand of a duplex) 5'CC3' and 5'AA3'. Similarly, 
5'(CG) A r/ 4 (AT) A r/43', paired with its complement, gives the average of the energies associated with 5'CG3', 5'GC3', 
5'AT3', and 5'TA3'. Corrections due to the junction between the two homopolymeric stretches and to the confinement 
energy of the loop section of the hairpin both decay like 1/iV; they can be eliminated by measuring hairpins with 
several different values N. Mechanical denaturation in the constant force ensemble can thus be used systematically 
to determine the 10 standard duplex stability parameters in a wide range pH, salt concentration, and temperature. 

VI. CONSTANT EXTENSION ENSEMBLE 

So far, we have considered only the constant force ensemble, in which a fixed force is applied to the two single strands 
of the dsDNA, and one measures the average number of base pairs opened or the average separation (r) between the 
ends of the two single strands. Constant extension experiments, in which the separation r is fixed, and the average 
force is measured, are also possible. In the classical thermodynamics of macroscopic systems, these two ensembles 
would be equivalent. That is, the functions (r)(F) and (F)(r) measured in the two ensembles would be inverses of 
each other. In single molecule experiments, however, such a relation is not guaranteed, and the two ensembles are in 
fact not equivalent in DNA unzipping. For simplicity, we assume throughout this section that the Kuhn length b of 
the single-stranded polymer is equal to the length a per chemical monomer. 

We begin by considering the constant extension ensemble in the absence of sequence randomness. We neglect long- 
ranged interactions within the single-stranded polymers; because r and F will always be parallel on average, we can 
work with the (signed) scalars r and F. Regardless of the elastic properties of the single stranded DNA (freely-jointed 
chain, Gaussian, etc.), one can define the statistical weight G2m(r) for a single-stranded chain of length 2m to have an 
cnd-to-cnd distance r. The partition function Z in the constant extension ensemble can then be viewed as a weighted 



sum over the number of unzipped bases to with r fixed. Given the energy cost gom of opening to bases, one has [118 20 



Z(r)= dmG 2m (r)e^p(-gom/k B T) . (57) 

Jo 

In the limit of large r, one expects the number of unzipped bases m to be proportional to r. It then makes sense 
to consider the free energy per base h(x) of the liberated single strands as a function of the extension per base 
X = r/2m. The free energy per base g(F) in the constant force ensemble is related to h(x) by the Legendre transform 
g(F) = h[x(F)] — Fx, and in the thermodynamic limit r — > oo with r/m fixed, we expect — k B ,T\n[G2m{i')] — 2mh(x). 
It is not difficult to show that the leading correction to this result is of order ln(m)/2. Hence, for large r the partition 
function becomes, up to r-independent multiplicative constants, 
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where we have introduced n = 2m/r = 1/x in the second line. For large r, Z may be evaluated in the saddle point 
approximation, which gives 
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The 0(l/r) term comes from subleading corrections to G2 m (r) that we have chosen not to calculate explicitly. The 
location n* of the saddle point satisfies h'(l/n*) = (n*)[h(n*) + go/2], where h'(x) = dh/dx plays the role of a force. 
Indeed, upon using the Legendre transform relation between h and g, we find that h'(l/n*) — F c . Thus, for large r, 
the average force in the constant extension ensemble takes the simple form 



(F) = -k B T 
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In the constant force ensemble, on the other hand, (r) oc 1// ~ 1/{F C — -f)j which upon inversion gives the slower 
approach to F c F = F c + 0(l/r). Both ensembles predict that complete unzipping of the dsDNA occurs at F = F c ; 
in fact, in the limit r — > oo, the constant extension ensemble simply demonstrates coexistence of the bulk unzipped 
and zipped phases, as in any first order transition. The approach to F — F c as r becomes large, however, is markedly 
different. Equivalence of ensembles exists only in the "thermodynamic limit" r — > oo. 

Because DNA unzipping does not show self-averaging, the situation becomes even more complicated when sequence 
randomness is introduced. In the constant force ensemble, (m) (and hence (r)) increases monotonically as F increases, 
for any DNA sequence. In the constant extension ensemble, in contrast, we expect large regions where d£/dm, which 
plays roughly the role of go, is smaller than average; when the unzipping fork enters one of these regions, (F) should 
decrease. Precisely such behavior is observed in experiments and simulations on the unzipping of lambda phage 
DNA J20J : (F) is seen to vary randomly about an average value as r is increased. For a given random sequence, the 
functions (r)(F) and (F)(r) thus cannot be inverses of each other. 

One can still ask, however, whether the disorder averages (r)(F) and (F)(r) are simply related. Once sequence 
heterogeneity is present, a term proportional to W{m) [see Eq. (152)] must be incorporated into Z(x). In analogy to 
Eqs. <M\ and <M), one finds 
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where k — (l/n*) 3 h"(l/n*). In passing from the first to the second expression, we have used the scaling properties 
of a random walk to make the substitution, valid on the level of statistical distributions, W(rn/2) = y/r/2W(n). We 
have also expanded around the location n* of the saddle point in the non-random case. Because the average terms in 
the exponential grow like r, while the coefficient of W(n) is only proportional to y/r, this expansion will still give the 
correct asymptotic behavior as r — + oo. 

Eq. ( p2| ) shows that the leading corrections to (F) (r) can be described by the equilibrium extension of a spring 
"dragged" across a random potential |18|. One can estimate the spring's extension by balancing the elastic energy 
cost of extension —rk(n — n*) 2 /2 with the typical random energy gain y/rW(n — n*) ~ yjr£±\n — n*\. These two terms 
are of the same order when \n — n 



(A 2 r/fc) 1 / 3 ; note that although n 



(A/fc 2 r) 1 / 3 . The typical energy gain due to extension is then Wr~A(n — n*) ~ 
is positive or negative with equal probability, the associated change in energy 



must always be negative. We thus expect that the disorder-averaged free energy should behave like 
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Note that the term proportional to W(n*) averages to zero. Upon taking a derivative with respect to r, one concludes 
that the disorder-averaged force in the constant extension ensemble approaches F c for large r according to 
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In contrast, in the constant force ensemble, (r) ~ (to) ~ 1/(F C — F) 2 , which upon inversion gives F c — F ~ l/( 
Once again, the two ensembles agree only on the location of the unzipping transition! 

There is one further, more subtle relationship between the two ensembles with sequence randomness. For a given 
sequence, the constant force partition function can be written in either of two ways: 
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where &F(r) = —k-QT\n[Z{r)] — F c r. These two expressions must of course ultimately lead to the same result, and 
this fact has interesting consequences for the properties of &F. Near the unzipping transition, / = 2g(F) — go ~ 
2g'(F c )(F — F c ) = 2\g' (F C )\(F C — F). Up to a constant factor (and neglecting exponentially suppressed contributions 
to the second integral from r < 0), both expressions for Z(F) can thus be viewed as Laplace transforms with respect 
to the same variable. Hence, we expect that SF{r) must have statistics very similar to those of W{m). In particular, 
for small F c — F, the integral with respect to r, like the one with respect to to, is likely to be dominated by its absolute 
minimum. In order to give the correct sequence of plateaus and jumps, &F(r) should thus behave like a random walk 
for large r, with (SF(r') — SF(r)) 2 ~ 2\g' (F c )\A\r' — r\. Scaling arguments due to Gerland et al. Q suggest that 
the force deviation SF(r) = (F(r)) — F c = d&F/dr should have a variance that decays like SF(r) 2 ~ A 1 / 3 (fc/r) 2 / 3 . 
For (&F(r') — &F(r)) 2 to behave correctly at large scales, SF(r) must then have a correlation length that grows like 
r 2 / 3 . One plausible explanation for this behavior is that, much as in the constant force ensemble, n locks into a single 
minimum of W(n) as r is increased over a finite interval before jumping to a new minimum, with the size of this 
interval increasing as r grows larger. 

VII. DYNAMICS 

So far, we have only considered static, equilibrium behavior. In real experimental systems, of course, dynamical 
effects can play an important role. The complete description of the dynamics of the unzipping transition, allowing 
for the possibility of thermally activated denaturation bubbles in the bulk dsDNA, is a challenging and still open 
problem. Four time scales come into play: the time scales T on d and Tbuik of base pairing and unpairing at the end of a 
double-stranded region and in the bulk, the relaxation time t ss (to) of the liberated single strands, and the rotational 
relaxation time T rot (m) of the still zipped dsDNA, which because of its helical structure develops excess twist as it is 
unravelled from one end. The latter two time scales are expected to depend on to. Cocco et al. |14j have suggested 
that there may be a fifth scale associated with overcoming an additional energy barrier to unzipping the first few bases 
of an initially blunt-ended dsDNA, but such a barrier would not affect the long time unzipping dynamics. Although 
not the subject of extensive investigation, the opening rate T on( j of terminal base pairs is thought to be between 1 
and 10 msec ]bf| , p8[ . Because opening a base pair in the middle of a double-stranded region requires overcoming two 
stacking interactions, instead of one for opening at the end, we expect Tbuik S> T on d [n3J53]; in unzipping experiments, 



the pulling force will further accelerate base-pair opening at the unzipping fork. Marcnduzzo et al. |12| have argued 
that the relaxation time of the ssDNA is given by the time required to move the entire single strand a distance x 
for each monomer that is opened or closed. Because the forces required to denature dsDNA are fairly large, each 
single-stranded monomer will be under considerable tension, with the average extension x per monomer of order the 
monomer size a. The mobility of a single strand of length to is then of order 1 / '(ATtrjam) , where rj is the solvent viscosity, 
regardless of whether the strand is described by the Rouse or by the Zimm model. Assuming a force F of order 10 pN, 
one then finds that t ss (to) ~ Aitria 2 m/ F ~ (lnsec)TO. Similarly, we can estimate the rotational relaxation time r ro t of 
a dsDNA molecule of length N — to by finding the time for it to turn through 27r/10.5 radians (with the denominator 
of 10.5 arising from the number of base pairs per helix turn in B form DNA in solution p3|). For a dsDNA strand 
of radius lnm, the torque exerted by the two single strands under tension is roughly 2 x lOpN x lnm = 20pN • nm. 
Classically, the rotational mobility /i ro t of a dsDNA molecule of length N has been calculated by assuming it is a 
straight, rigid rod, yielding the value /i rot ~ (2 x 10 _8 sec/gcm 2 )A J54j; this would imply T rot ~ (3nsec)(iV — to). 
More recently, Nelson has argued that the presence of intrinsic bends in natural dsDNA could decrease the rotational 
mobility, and thus increase r rot , by several orders of magnitude ]5q | . 

The time dependence of the number of unzipped bases m(t) will be determined by which of these four time scales 
is the slowest. The most difficult situation to analyze occurs if the system is dominated by Tbuik, as we expect to be 
the case for small enough to and N. In this case, the dynamics of the denaturation bubbles in the bulk dsDNA will 
be slower than the dynamics of the actual unzipping. Unlike in our equilibrium calculations, the bubbles then cannot 
be integrated out to give an effective (local) dynamics that depends only on to. Indeed, in the limit that bases at 
the unzipping fork open much faster than those in the bulk, the unzipping fork will propagate into an almost frozen 
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landscape of opened and closed base pairs. Strongly non-equilibrium effects, including a depression of the effective 
F c , could then become apparent [p6| . Because T ss (m) grows with to, it must eventually become slower than Tt> u ik; 
beyond this point, more conventional behavior should reemerge. 

Fortunately, in physiological conditions, opening of base pairs in bulk dsDNA is extremely rare. Well below the 
melting temperature, it is then reasonable to assume that all base pairs beyond the unzipping fork are closed, and to 
focus only on the position of the unzipping fork. Consider first the case in which the slowest of the three remaining 
timescales is independent of m, either because r cn d is the slowest (as will be the case for m < N < 10 3 or 10 4 under 
the assumption of a straight rod rotational mobility for dsDNA) or because r rot is the slowest, but with to <C N 
so that changes in to have a negligible effect on T rot . In this regime, the unzipping dynamics is dominated by the 
diffusion of the unzipping fork in the one-dimensional energy landscape £ (m). In other words, it is an example of 
the well-studied problem of a random walk in a random force field, sometimes known as the Sinai problem p4j (for 
reviews, see p5[). The overdamped dynamics associated with the continuum free energy of Eq. ( |l3| ) then takes the 
form 

dm 5£{m) w % 

at dm 

= -T[f + r 1 (m)] + ((t), (65) 

where the effect of thermal fluctuations is included through the noise source £(£) with correlations 

(((t)((t'))=2k B TTS(t~t'). (66) 

The magnitude of the phenomenological drag coefficient T is set by the slowest time scale: 

r = ITt ' (67) 

rk B l 

with r equal to r on d or r ro t as appropriate. We expect that Eq. (roq) describes the dynamics of the unzipping fork at 
long times for small /. In the absence of sequence hetergeneity (77(771) = 0), it yields simple diffusion with drift above 
the unzipping transition, 

(m(t)> = (r|/|)t and ([m(t) - (m(t))} 2 ) = (Tk B T)t . (68) 

In contrast, in the presence of sequence heterogeneity, the long time dynamics is determined by large energy barri- 
ers that grow with 777; a number of rigorously-established results can then be reproduced by simple physical argu- 
ments P5| , ^9| , p6| . For example, when F — F c (i.e. / = 0), £(m) ~ VAto; taking this to be a typical barrier size, one 
finds that the time to go a distance 777 is t ~ r exp(vAro/fcBT), suggesting that m(t) is typically of order In (t/r). 
Indeed, it is known that in the presence of a single reflecting wall (in our case, the end from which the semi-infinite 
duplex is being unzipped), the ratio m(t)/ In (t/r) approaches a ^-independent limiting distribution at large times p9[ . 
Similarly, just below the unzipping transition, the unzipping fork is essentially always in a region where the small bias 
/ can be ignored. Given that TOj ump ~ (777) ~ A// 2 , we expect that the typical time to equilibrate at a bias / (and 
in particular to jump from one local minimum to a new minimum with lower energy as / is decreased) should be of 
order rexp(vA//), a result that is supported, up to logarithmic factors, by renormalization group calculations fl49|| . 
Just above F c , the dsDNA must eventually unzip completely, but the propagation of the unzipping fork is again 
dramatically slowed by the presence of large energy barriers. The distribution of barrier heights is known to have 
exponential tails |p9|] , leading to a distribution of trapping times T that decays like 1/T M+1 , with 

71 - 2k B T\f\/A . (69) 

This same exponent appeared, for example, in Eq. (pq), and is known more generally to control the probability of 
large excursions of a biased random walk [e.g. £(m)] against its bias [p0| . The time to open to base pairs is a sum of 
0(777) such trapping times, with each time chosen independently. For fi < 1, the median value of this sum grows like 
to 1 '^, so one has sublinear growth with time of the sequence-averaged degree of unzipping, 



(m(t)>~^ (p <1). (70) 

The average extent of unzipping (m(t)} of a given polynucleotide is typically also of order f, but with time and 
sequence-dependent fluctuations in the prefactor. For 1 < 77 < 2, (m(t)) ~ t recovers its usual behavior, but there is 
still anomalous behavior in the second cumulant: (m(t) 2 ) — (m(t)) 2 typically grows like £ 2 "'. Conventional diffusion 
with drift is recovered only for forces large enough that /j > 2, or |/| > A/k B T ~ 0{k B T) for dsDNA in physiological 
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conditions. For the freely-jointed chain expression ([7|) for g(F), this condition translates to F — F c > 5pN; there 
is thus a substantial window where anomalous drift can be observed in a single molecule experiment. Just as for 
the equilibrium results discussed earlier in this paper, most of the qualitative features of the unzipping dynamics 
for uncorrelated random sequences also apply to the unzipping of correlated random sequences, albeit with different 
exponents J61[ . 

These results have interesting implications for attempts to read sequence information via experiments which monitor 
the velocity dm/dt of the unzipping fork for a fixed force F > F c . Read naively, Eq. (^5|) suggests that the coarse- 
grained sequence fluctuations embodied in 77(771) and the thermal noise £(£) will together modulate a mean unzipping 
velocity (dm/dt) = T\f\. This picture is certainly correct sufficiently far above the unzipping transition, where deep 
traps in the energy landscape are rare. However, we can estimate that thermal fluctuations will the obscure the 
sequence-dependent modulation of the mean velocity whenever 



(C(t)C(0> » r 2 v (r\f\t) v (r\f\f) , (71) 

where we have used the zeroth-order relation 

m(t) « Tft (72) 

to approximate m(t). Eqs. (pi) and ( |66| ) then show that thermal noise can only be neglected provided 

TA 
2fe B TT « -jjr , (73) 



2k B T\f\ , s 

» = "4^ « 1 • (74) 



or for 



In this regime, however, the approximation ( |72| ) breaks down; indeed, we have seen that for fj, < 1, the dynamics 
is dominated by the presence of deep traps in the energy landscape, with m(i) ~ i M . Efforts to extract sequence 
information from dm/dt in this regime will be seriously hampered by the slow, erratic dynamics associated with 
energy barriers of order y/Am. 

The results discussed above are valid as long as the slowest time scale r is roughly independent of 771. If m-dependence 
become important, large energy barriers still dominate the dynamics, but our arguments must be modified to account 
for this new feature p2 ]. Thus, for example, if m becomes large enough, the relaxation of the single strands will set 
the basic scale for the dynamics. Exactly at the transition, we then expect t ~ m exp(\/ Am / k B T) (the prefactor of 
to arising from the fact that t ss ~ 777); this yields exactly the same very slow asymptotic behavior (777) ~ ln 2 (i) as 
before. Likewise, the equilibration times below the transition remain unchanged. On the other hand, for F > F c , 
new behavior emerges. The time to go a distance 777 is now of order X)™=o n -^«' with each of the T n chosen from the 
same distribution with tails like 1/Tfi + . The median of the distribution of this new sum occurs at a time of order 
to (m+i)/^ suggesting (m)(t) ~ £wu i + 1 ). As hypothesized in [[L2|, the scaling laws in this regime are thus related to 
those for T sa (m) < T cn( j by the substitution t 1— > t/x. Similarly, when r ro t(777) ~ N — 777 is the slowest timescale, the 
logarithmic growth at or below the transition remains unchanged, while above the transition an analysis of a sum of 
trapping times Y^n=o^ ~ n )T n suggests (m)(t) ~ N[l — (1 — kt^ /N tJ,+x ) l ^ (jJ ' Jrl >], with k an undetermined constant. 
Thus, the fact that t ss and r rot depend on m does not change the essential physical result that sequence randomness 
leads to large energy barriers, and thus to a substantial slowing down of unzipping. 

VIII. CONCLUSIONS 

In this paper, we have given a detailed theoretical analysis of a simple micromechanical experiment: the mechanical 
denaturation, or unzipping, of double-stranded DNA with a random base sequence. Although of current experimental 
interest in its own right, this system can also serve as a springboard for developing ideas with potential applications 
to micromanipulation experiments on more structurally complicated biomolecules. Several such ideas emerged from 
our study. On the most basic level, the constant force and constant extension ensembles were shown to give different 
force-extension curves in single molecule experiments. We argued that unzipping in the constant force ensemble can 
always be described by a one-dimensional free energy landscape £(m), with an average slope / = 2g{F) — g set by 
the applied force F and F-independent fluctuations about this average determined by the structure and sequence of 
the molecule being examined. The number of monomers (m) liberated at a given F is then simply an equilibrium 
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average over to with weight exp[— £ (m)/&BT]. Once sequence variation is present, £{m) will in general pass below 
zero for small enough / > 0. Partial mechanical denaturation then allows the liberated monomers to gain more free 
energy by aligning with the applied force than they lose by breaking native contacts. For small /, (to) should be 
dominated by the deepest minima in £{m). 

For the particular case of unzipping a single dsDNA molecule, these qualitative observations can be given more 
precise meaning. The energy function £(m) then behaves like a biased random walk on scales beyond a few bases. 
When the pulling force F is increased to a critical value F c , the bias / changes sign, and a phase transition occurs. 
Randomness is always relevant at this wetting-like transition, with the average number of broken base pairs (m) 
diverging like l/(-F c — F) for homopolymer duplexes, but like 1/(F C — F) 2 in the presence of a random sequence. 
Individual dsDNA molecules approaching the unzipping transition open in a sequence of sharp jumps, separated by 
long plateaus in which (m) remains essentially constant. The jumps become sharper and sharper as / — ► 0. For small 
/, (to) for any given polymer must approach the absolute minimum of £(m). The plateaus and jumps can then be 
understood as arising from a sequence of minima. A given minimum remains stable over a range of / values. As 
the bias / decreases, however, eventually a new minimum at larger to will become lower in energy; at this point, 
(m) will jump to the new minimum. Starting from this picture, we were able to make precise predictions about 
statistical features of single molecule unzipping such as the distribution of jump sizes TOj ump . These showed good 
agreement with simulations. The distribution of TOj ump also revealed that the correlation between (to) at different 
values /i and f% < /i of / vanishes for small J2I K- As a result, even though (to) can differ significantly from (to) 
at any single force value, a plot of (to) versus / for a given random sequence still shows the same scaling behavior 
as does the average over many sequences (to). Several of these features, most notably the dominance of the absolute 
minimum, are known to occur more generally in random systems; indeed, an added interest of DNA unzipping is 
that it is a physic al realization of one of the simplest models in the statistical mechanics and dynamics of random 
systems [[24 25 T| . Similar conclusions should apply to experiments on the unzipping of individual RNA hairpins [Q , 



although experiments on longer hairpins would be required to provide a complete test of the theory. 

Although the predictions for DNA unzipping do not apply directly to micromechanical assays on systems such 
as proteins Q or the complex RNA folds of naturally-occuring ribozymes jjj, they do suggest a definite agenda for 
understanding such experiments. In varying the pulling force F in the constant force ensemble, one is essentially 
searching for local minima along the denaturation pathway; each observed plateau corresponds to a state that is 
metastable at zero force, but is stabilized in an appropriate range of F values. If g(F) can be determined from 
measurements on unfolded strands, then the energies of the original metastable states are easily inferred from the forces 
at which jumps occur. Related ideas have been applied with great success to the interpretation of micromanipulation 
experiments on individual "lock and key" bonds [62]]. 

This picture of plateaus and jumps can could break down if, instead of traversing only a single pathway, the 
mechanical denaturation can proceed along one of many different routes [ 181 . For example, in micromanipulation 
experiments on folded RNA's, it can transpire that a series of many hairpins are under tension simultaneously, as in 
Figure |18|. In the constant force ensemble, if there are M long hairpins with independently chosen random sequences, 
the average extension (r) will be simply the sum of M independent single hairpin extensions. As a function of /, 
each of these single hairpins will go through its own sequence of plateaus and jumps. Each time a particular hairpin 
has a jump, (r) will also jump, but the typical jump size will be (r)/M instead of (r). Similarly, the plateaus in (r) 
will be shortened: The probability that none of the single hairpins jump as / is decreased from f\ to /2 is (/2//i) M , 
which decays very quickly for large M, As M increases, shorter and shorter jumps and plateaus will eventually merge 
into a smooth curve. Indeed, one expects that as M — > 00, (r) — > M{r) + 0(\/M). That is, a system of many 
hairpins should exhibit self-averaging. Moreover, because the limit of many hairpins is essentially a thermodynamic 
limit, equivalence of ensembles must also be recovered. In fact, the force-extension curve in the constant extension 
ensemble must approach the disorder-averaged curve for the constant force ensemble as M becomes large. In physical 
terms, there must be a constant tension along the entire chain of hairpins; in the limit of many hairpins, each one 
sees this tension rather than the extension imposed on the entire chain. Once there are enough competing hairpins, 
any equilibrium experiment will give the same smooth curve. Such smoothing, with its attendant loss of structural 
information, has recently been observed in simulations |18| . Both the continuous increase of the disorder average and 
the plateaus and jumps of a single hairpin can thus appear in single molecule experiments. 

We note in conclusion that the ideas from the physics of one-dimensional disordered systems applied here to 
mechanical denaturation experiments may find applications elsewhere in biophysics. To cite one example, the DNA- 
binding protein recA adheres with a binding affinity that depends strongly on the nucleotide sequencec |6JJ. When 
ATP is replaced by the non-hydrolyzable analog ATP-7S, allowing the system to reach equilibrium, the position of 
the point-like polymerization boundary separating domains of polymerized recA from bare DNA can be described 
by a coarse-grained model like Eqs. (|13|)-(|l5|). Similarly, the motion of a single boundary during polymerization can 
be described as biased diffusion in a random force field, and one might expect in appropriate parameter ranges to 
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find strong disorder- induced slowing of the sort discussed in section VII. More generally, the kinetics of multiple 
polymerization boundaries (associated with multiple recA domains) on a single long polynucleotide can naturally be 
mapped to the dynamics of kinks in a one-dimensional random field Ising model, which is known to be in the Sinai 
universality class |^J,^5[ . Although the relevance of such anomalous dynamics to the functioning of biological systems 
in vivo remains to be established, these effects may play a role in a number of in vitro assays. 
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APPENDIX A: SIMULATION METHODOLOGY 

This appendix describes the numerical method used to generate the data points in Figures || through [13|. The 
simulations were performed on a simplified model of dsDNA in which all A-T base pairs have a pairing energy £at 
and all G-C base pairs a pairing energy £qc PQ]- in contrast to the convention of subsection II A|, here we define pairing 



energies as the free energy difference between the bound base pair and the two monomers subject to the tension F . 
The average pairing energy of the sequence is thus /. All base pairs other than the m unzipped bases are assumed to 
be closed, an excellent approximation for dsDNA in physiological conditions. We are interested primarily in behavior 
near the unzipping transition, where many bases have been unzipped. In this regime, most of our predictions depend 
only on universal properties of random walks, so the simplifications in our model are justified. Our results are always 
reported in terms of the parameters / and A that can be defined with reference only to the large m behavior of £ (to). 
We assume for simplicity that A-T and G-C pairs occur with equal probability 1/2, and we take the pairing energies 
to be £at = / — vA and eqc = f + \/A. The disorder strength A is usua lly chosen to be between 1 and 9, while 
/ varies from 1 down to a lower bound determined by demanding that (to) sw A/(2/ 2 ) < A/8. Here N is the total 
number of base pairs in the dsDNA, which we usually choose to fall between 5 x 10 5 and 5 x 10 6 . For a given sequence 
{si}, with each e, equal to either £at or £gCj £{in) takes the form £(m) = X^=i £ i- The average and variance of 

2 

£ are then £(m) = mf and £(m) 2 — £(m) — Am, allowing direct contact with the continuum limit described by 
Eqs. ( |13| ) and (fl4|). The temperature k^T is set to one. 

Our one-dimensional system is sufficiently simple that it is possible to proceed by direct evaluation of the partition 
function Z = X) m =o ex P[ — £( m )] an< ^ the average number of unzipped bases (m) = '^2 m=0 nnexp[—£(m)]/Z. For 
each random sequence, successive values of Si are chosen at random, starting with ej\f. The running sums Zi = 

E m =i ex P["^( m ) + £(* — 1)] an d ( m )i — Em=i mex Ph^( m ) + £(* ~ 1)] are then updated according to Zi = 
exp(— £j)(l + Zi+i) and (m) i = exp(— £i)(i + (m),- +1 ); once the sum is complete, (m) is normalized by dividing by 
Z. We keep separate sums for each value of /, and, at each i, update each of them with the same random choice of 
£at or £gc- I n some runs, we also kept track of the running sum of £i and of the location of the deepest minimum 
encountered up to position i. 

The binned data in Figures O and R2 represent the output of several thousand runs with independently chosen 
random sequences and varying values of A and N. In Figure O, which plots the distribution of (m) , data points for 
each value of / from each run were rescaled appropriately and used together to construct the histogram. Similarly, 
all pairs of points with /2//1 ~ 0.77 were rescaled and used in making the histogram of mj ump in Figure O; in order 
to account for the predicted delta function at rnj ump = 0, a fraction /2//1 of the total number of data points was 
subtracted from the number of counts in the bin that included mj ump = 0. 
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FIG. 1. Sketch of the DNA unzipping experiment. One of the single strands of a dsDNA molecule with a random base 
sequence is attached by its end to a solid surface, and the other is pulled away from the surface with a constant force F. As 
a result, the double strand partially denatures, separating m base pairs (m = 2 in the figure). The distance between the ends 
of the two single strands, or extension, is r. Inset: Schematic phase diagram in the temperature-pulling force (T-F) plane of 
a dsDNA molecule in three dimensions. At low enough T and F, the polymer is in the native, double-stranded phase. At the 
phase transition line F C (T), the DNA denatures and the two strands separate. Thermally-induced melting occurs at zero force 
at a temperature T m . As indicated by the arrow, this paper considers instead the unzipping transition, in which the phase 
transition line is crossed at non-zero F. The reentrance at low temperatures is predicted in pjj. 
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FIG. 2. Schematic of dsDNA unzipping through a narrow pore pa]. The pore is assumed to be large enough that sin- 
gle-stranded DNA, but not double-stranded DNA, can fit through it. Under the influence of an electric field or comparable 
force F, one single strand inserts into the channel and is gradually pulled through. As the strand is drawn through the pore, 
it must unzip from its complementary strand. 
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FIG. 3. Definition of the variables a and Oi in the Ising-like model [Eq. (hi)]. In this figure, 3 bases are open at the end of 
the dsDNA. Counting the first open base as n — 0, the location of the first closed base is then ci = 3. Similarly, the next open 
base is at o\ = 10. 
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FIG. 4. Definition of the variables in the continuum model [Eq. (0)]. The distance between the ends of the two single strands 
(the extension) is r, and the number of open bases is m. The bases are indexed by n; the separation between the two single 
strands at the n th base pair from the end is given by R(n). 
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FIG. 5. Sketch of the bulk free energies per base pair go of the zipped phase and 2g(F) of the unzipped phase as a function 
of the applied force F. These negative energies are measured relative to the free energy of a base pair at infinite separation 
with F = 0. While go is independent of F, 2g(F) decreases with increasing F. At a critical force value F c , the zipped phase 
becomes unstable relative to the unzipped phase, and a phase transition occurs. The equilibrium free energy per base pair as 
a function of F is given by the solid curves; the discontinuous change in slope at F c indicates a first order transition. 
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FIG. 6. Schematic of the base separation probability Poo(R) below and above the first order unzipping transition (the 
probability is expected to depend only on the radial distance R, not on angular coordinates). Below the transition, P 00 (R) 
decays quickly to zero beyond the range of the attractive potential V^R). Above the transition, in contrast, it approaches a 
constant non-zero value as R — > oo. 
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FIG. 7. Log-log plot of Eq. (M4) for (m) as a function of / = 2g{F) — go ~ F c — F. For large /, the plot has slope — 1, but it 
crosses over to slope —2 at / ~ A/hsT. 



29 



10 c 



icr 



A 

s 

V 2 

10 



10 L 



10" 



°^AAAA 



□ 



□ 



ooo 



□ 



o 



DDDDDDD 



oooooooooooooo AAA 



10" 



10" 



10" 



10 L 



ffcj 



FIG. 8. Log-log plot of the average number of bases opened (m) (closed symbols) and the location of the absolute minimum 
TTimin of £ (m) (open symbols) as a function of the distance / from the unzipping transition. Both variables are plotted for 
each of four individual polymers, represented by four different symbol shapes, with independently chosen random sequences 
(variance A = 9(ksT) ) of length N — 5 x 10 bases. Note that, except when m — 0(1), (m) and m m m coincide very well. 
The energy landscapes for the four duplexes are plotted for a particular value of / in Fig. Bl 
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FIG. 9. Plot of four different random realizations of E(m). All four random walks have the same variance A = 9(/cbT) 2 
and average bias / = 0.0025A;bT. All four also pass below £ = 0, suggesting that, near the unzipping transition, dsDNA 
molecules with random sequences will usually have energetic reasons to partially unzip. The four energy landscapes are taken 
from the four polymers whose force-extension curves are shown in Figure H; the solid, dashed, dotted, and long-dashed curves 
correspond, respectively, to the circles, squares, diamonds, and triangles. In order to focus on regions where £(m) is near zero, 
the landscapes for m > 10 6 are not shown. 
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FIG. 10. Plot illustrating the physical origin of jumps in extension versus force during unzipping. The two curves represent 
random walks £(m) with identical random contributions W(m), but different average biases /i = 0.0087 (upper curve) and 
/2 = 0.0067 (lower curve). As indicated by the arrows, in the upper curve, the absolute minimum m m i n is at m m i n ~ 5, 000, 
while in the lower curve, it is at m m i n ~ 445,000. As / is tuned from /i down to /2, ra m in and thus (m) jump from one 
minimum to the other. 
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FIG. 11. Log-linear plot of the distribution over different random sequences of the average number of opened bases (m). 
The horizontal axis gives (m), suitably rescaled so that random sequences with different values of / and A can be compared. 
The vertical axis shows the log of the probability of seeing a particular (m). The squares represent binned data from numerical 
simulations (described in the Appendix), the solid curve the analytic prediction of Eq. (fell) based on the assumption that 
(m) = m m i n . This prediction has no adjustable parameters. The scatter seen for large (m)/ 3 /A is the result of counting noise. 
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FIG. 12. Log-linear plot of the distribution of jumps mj ump for /2//1 ~ 0.77. mj ump /f/A is plotted on the horizontal axis, 
the log of the probability of mj ump on the vertical axis. The points represent binned data from numerical simulations (described 
in the Appendix), the solid curve an analytic prediction (no adjustable parameters) based on the assumption that (m) — m m i n . 
The scatter seen for large mj ump /|/A is the result of counting noise. 
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FIG. 13. Plot illustrating the recovery of the disorder-averaged scaling law (m) ~ A/2/ 2 in the force-extension curve of a 
single random heteropolymer. The points give (m) as a function of / for a single polymer; the solid line is the best-fit power 
law, with exponent —1.96 ± 0.12. 
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FIG. 14. The energy landscape £(m) for unzipping bacteriophage lambda DNA at two different biases. In this figure, the 
base pairs are opened in the reverse of the conventional fell order, starting with base number 48502. Base pairing and stacking 
energies are taken from [tS3| and are scaled by ksT, with T — 37° C = 310K. The biases /i and fa are the locations of the two 
jumps marked in the force-extension curve of Figure [13. The locations of the two minima that exchange stability at each bias 
are indicated by arrows. Note the difference in scales between the upper and lower plots. 
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FIG. 15. Log-log plot of the average number of open bases (m) versus bias / for unzipping bacteriophage lambda DNA. The 
energy £(rn) is as in Fig. [14. The three plateaus correspond to the minima of £{m) at m ~ 1, m « 1500, and m at 26,000; 
the jumps between them occur at biases /i and fi as indicated in the figure. Assuming freely jointed chain elasticity for 
ssDNA [Eq. ml)], with b — 1.5nm |B2|, the definition of / [Eq. ([12])] implies that these biases correspond respectively to forces 
of F\ — 7.90pN and F2 — 8.14pN. The middle plateau is actually subdivided into three smaller plateaus, separated by jumps 
between nearby minima. Similarly, a local minimum at m = 60 is the most stable for a small range of / between the plateaus 
at m = l and m ~ 1500. 
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FIG. 16. Schematic energy landscape for a designed oligonucleotide duplex that could be used to measure base pairing and 
stacking energies. The duplex is chosen to have stronger base pairs near the end from which is it opened, and weaker base 
pairs at the far end. The energy of opening £(m) thus first slopes upwards, then downwards, and the only two minima occur 
for a completely unzipped and completely zipped (m = 0) duplex. As the bias is tuned through the unzipping transition, the 
two minima exchange stability, giving rise to a sharp unzipping transiton (see Fig. h7J) . 
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FIG. 17. Force extension plot for a designed oligonucleotide duplex that could be used to measure base pairing and stacking 
energies (see Fig. [L6|). Starting from the end from which it is being unzipped, the duplex has sequence 5'(A)2o(C)2o3', with 
base pairing energies taken from |33). The sharp unzipping transition allows an accurate measurement of F c = 8.32pN, and 
thus of the energies stabilizing the duplex. Forces are calculated assuming that ssDNA is a freely jointed chain [Eq. (M)], with 
Kuhn length b = 1.5nm H. 
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FIG. 18. Sketch of several RNA stems being opened in parallel, as might occur in a micromechanical experiment on a 
ribozyme or other folded RNA molecule. If each stem has an independently chosen random sequence, then in the limit of 
a large number of long stems, the number of unzipped bases will equal the disorder averaged value (m). The measured 
force-extension curve must then be smooth and monotonic in any ensemble. 
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